trail sharding (spatial + time buckets) to bypass Meilisearch’s 1000‑hit limit #731
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In order to circumvent the 1k-hit Meilisearch limit, I attempted to implement a cluster strategy. This is based on two dimensions:
Time: time buckets with a maximum of 1k trails are created and sorted by creation date. Each trail is assigned to a unique time bucket.
Space: The map (bbox) is divided into nodes. The entire globe is used as the start node. As soon as a node contains more than 1k trails, it is divided into four subnodes (usually, the start node is divided at -30/0 rather than 0/0, so the split runs through the Atlantic rather than Europe). A trail can be contained in several nodes if it crosses node boundaries.
All clusters have a global scope. Experiments with a per-user scope were unsuccessful (due to issues with migration speed, data handling and write speed, etc.). In my test cases involving 3k trails and five users, the global approach showed no disadvantages in terms of speed.