Make go to relevant constant in time using an index #3526
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Closes #3327
Go to relevant file is currently performing very slow in large repositories as pointed in the issue.
Profiling with stackprof it was found that the main problem is in the call to Dir.glob that matches all the files that have the desired patterns. This is disk I/O intensive and have a time complexity O(N).
This PR provides a proof of concept of how that time could be reduced.
Implementation
Create an index for the basename paths of the files. The index has the following structure
"foo"
)["foo_test.rb", "test_foo.rb", "foo_spec.rb"]
)This index is going to find candidates for the given query in nearly constant time (Assuming K matches is low for the search result).
Integrate this new index in the
RubyIndexer::Index
.Space Complexity: O(N) the number of files in the repository
Time Complexity on boot: O(N)
Time Complexity on query: O(1)
Assuming the 400K files mentioned in the issue this should consume approximately
Note: Alternatively this index could be lazy computed only using this resources if the user uses the feature for the first time.
Automated Tests
Not for now, want to receive feedback first.
Manual Tests
Tested in the core Buk repository with more than 25K ruby files. Benchmark is provided in this draft.
Notes