-
Notifications
You must be signed in to change notification settings - Fork 90
Implement prefixed index to remove hashing from joins initial load #549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: c0674b1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 13 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
More templates
@tanstack/angular-db
@tanstack/db
@tanstack/db-ivm
@tanstack/electric-db-collection
@tanstack/query-db-collection
@tanstack/react-db
@tanstack/rxdb-db-collection
@tanstack/solid-db
@tanstack/svelte-db
@tanstack/trailbase-db-collection
@tanstack/vue-db
commit: |
Size Change: 0 B Total Size: 68.4 kB ℹ️ View Unchanged
|
Size Change: 0 B Total Size: 1.44 kB ℹ️ View Unchanged
|
Huge improvement! |
I don't like that we're special casing the index to handle a specific structure (the prefix keys) that occurs in joins. The indexes already handle keyed streams, so could we move the row's PK into the stream's key instead? What i mean is returning this from the map(([currentKey, namespacedRow]) => {
// Extract the join key from the main table expression
const mainKey = compiledMainExpr(namespacedRow)
return [[mainKey, currentKey], namespacedRow]
}) Or, if the index isn't good at handling a tuple like that, we could create a single string key: return [`${mainKey}-${currentKey}`, namespacedRow] That way we don't need to special case the index implementation. |
The row PK is not the join key, the index is on the join key. This prefix is essentially something that's extracted from the row that uniquely identifies it, and allows us to skip the expensive structural hashing unless there are multiple prefixes for the same key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of time i'm fine going forward with this implementation but could we keep the old implementation and use this implementation only for the joins? I think it makes sense to have a specialised prefix index implementation for joins and keep the old index implementation for operators that don't need the prefix. That means we can also slightly simplify this prefix implementation since the PrefixMap
will always have a TPrefix
(so no need for the NO_PREFIX
).
Unfortunately we can't as someone could still use the db-ivm package without the prefixed rows going into the join. I don't like the idea of having to maintain - and add tests for - multiple different index implementations. I would much prefer to have one that worked well until we found a real need to have multiple. |
Currently the ivm index system tries to avoid hashing values until its seen > one item for a specific key, this removes all hashing on initial load on one side of most joins, but we still end up hashing a lot of values of the other. Think of joining comments to issues, a many to one join, we end up hashing the comments.
When performing joins DB does this
map
just before the join:this means we see a "prefixed" row inside the join with the original PK or the row. We can use this to avoid hashing until we see > one item for both the join key and the prefix (the row PK).
This PR implements a new index that keeps track from keys and prefixes (when available) and entirely removes the hashing on the initial load (we don't send duplicate rows for a PK). It then uses the hashing only during the incrimental stage where this is significantly lower throughout.
before:
after:
Implementation notes: