-
Notifications
You must be signed in to change notification settings - Fork 195
perf: improve leaderboard mysql query #1050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
d2ad5cd to
00ba835
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes MySQL leaderboard queries by reducing the number of columns in the ORDER BY clause from 4 to 2. The optimization addresses performance issues in spaces with large datasets where queries were taking 5-6 seconds due to MySQL using filesort on 1.2M rows.
Key changes:
- Implements context-aware ordering based on filter type (space, user, or no filter)
- Reduces ORDER BY columns to ensure uniqueness while minimizing query complexity
- Maintains deterministic ordering for pagination consistency
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
00ba835 to
c098357
Compare
Why are there duplicates? 🤔 We have a unique space-user list right? |
| const orderFields = [orderBy]; | ||
| if (where.space) { | ||
| orderFields.push('user'); | ||
| } else if (where.user) { | ||
| orderFields.push('space'); | ||
| } else { | ||
| orderFields.push('last_vote'); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could go wrong if we do something like this?
| const orderFields = [orderBy]; | |
| if (where.space) { | |
| orderFields.push('user'); | |
| } else if (where.user) { | |
| orderFields.push('space'); | |
| } else { | |
| orderFields.push('last_vote'); | |
| } | |
| const orderFields = [orderBy]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential duplicates on pagination, and random ordering when 2 rows have same value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have any example for leaderboard table?
with this approach, if where.space exists and orderBy is already user, we push user again, which will send only one order 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
this case is for showing the leaderboard of a specific space, and ordering by user.
Since user is unique by space, this ordering does not need additional order fields to ensure uniqueness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it is same on the other way right, space is also unique by user 🤔 leaderboard contain unique index ["user","space"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we just need additional order field only when order is not space/user? or when where is different from orderBy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we just need additional order field only when order is not space/user? or when
whereis different fromorderBy
In all cases, we need 2 order by fields, in case the first one has same values for multiple rows, to avoid doublons on pagination, and have a consistent ordering.
- On space dashboard:
where.spacesexist,order byis the field specified by the ui, likevote_count. If multiple users have the same number of votes, there may be duplicate when loading next pages. By using additionaluserin theorder by, we ensure this ordering is always unique, sinceuseris a unique value by space. - Same principle, but in opposite orders for user dashboard
- When neither
where.spacenorwhere.userexists, mean that this request is not coming from our UI, and coming from a direct custom graphql queries from a third party on the hub. In that case, we justlast_voteas fallback ordering, as it's the one to have least chance of duplicates.
In case that in space dashboard, the UI also sort by user, the uniq functions will remove the duplicate order by, so we don't have order by user, user.
What kind of other scenario are you thinking about, not covered by this PR ?
Fix https://discord.com/channels/955773041898573854/1030772407226601522/1407286799570702387
Situation
We're using a few fields in the
ORDER BYleaderbord query, to ensure that all results are always returned in a specific order, in case of duplicates on one or more order by field.This ensure consistent and unique results while paginating.
Issue
While those fields ensure consistency and uniqueness, it's not optimized for tables with huge amount of rows, since mysql is only able to use a single index on a table.
This creates queries that takes a long time to resolve for spaces with huge amount of rows, like stargate:
It's reading 1.2M of rows on each query, and using
filesortto sort all the results, resulting in queries taking 5-6 secondsFix
This PR will only use the strict minimum necessary columns to ensure uniqueness in returned result:
user(only one user per space)space(each space is unique per user)order bywill default tolast_voteThis results in only using 2 columns in
ORDER BYmaximum, instead of 4, and speed up the queries.Test
This query will resolve in less than 1s, instead of 6s.