Skip to content

Make go to relevant constant in time using an index #3526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

domingo2000
Copy link
Contributor

@domingo2000 domingo2000 commented May 22, 2025

Motivation

Closes #3327

Go to relevant file is currently performing very slow in large repositories as pointed in the issue.

Profiling with stackprof it was found that the main problem is in the call to Dir.glob that matches all the files that have the desired patterns. This is disk I/O intensive and have a time complexity O(N).

This PR provides a proof of concept of how that time could be reduced.

Implementation

Create an index for the basename paths of the files. The index has the following structure

  • Key: The basename without the test prefixes or sufixes. (ej: "foo")
  • Value: An array of all the files that matches that basename. (ej: ["foo_test.rb", "test_foo.rb", "foo_spec.rb"])

This index is going to find candidates for the given query in nearly constant time (Assuming K matches is low for the search result).

Integrate this new index in the RubyIndexer::Index.

Space Complexity: O(N) the number of files in the repository
Time Complexity on boot: O(N)
Time Complexity on query: O(1)

Assuming the 400K files mentioned in the issue this should consume approximately

  • 40mb more in indexing
  • 1-7 seconds more in initial indexing (time consumed using the glob when starting)

Note: Alternatively this index could be lazy computed only using this resources if the user uses the feature for the first time.

Automated Tests

Not for now, want to receive feedback first.

Manual Tests

Tested in the core Buk repository with more than 25K ruby files. Benchmark is provided in this draft.

Benchmarked in Buk main repository having ~25K ruby files.

                      user     system      total        real
old-glob          1.019865   0.292442   1.312307 (  1.312549)
new-index         0.000049   0.000007   0.000056 (  0.000056)

Notes

  • Current draft does not have typing
  • Current draft was heavy LLM written to make a fast proof of concept, i need to review all the changes before a final version.

@domingo2000 domingo2000 requested a review from a team as a code owner May 22, 2025 17:02
@domingo2000 domingo2000 marked this pull request as draft May 22, 2025 17:02
Copy link

graphite-app bot commented May 22, 2025

How to use the Graphite Merge Queue

Add the label graphite-merge to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve GoToRelevantFile performance
1 participant