Implement prefixed index to remove hashing from joins initial load #549

samwillis · 2025-09-13T16:54:15Z

Currently the ivm index system tries to avoid hashing values until its seen > one item for a specific key, this removes all hashing on initial load on one side of most joins, but we still end up hashing a lot of values of the other. Think of joining comments to issues, a many to one join, we end up hashing the comments.

When performing joins DB does this map just before the join:

    map(([currentKey, namespacedRow]) => {
      // Extract the join key from the main table expression
      const mainKey = compiledMainExpr(namespacedRow)

      // Return [joinKey, [originalKey, namespacedRow]]
      return [mainKey, [currentKey, namespacedRow]]
    })

this means we see a "prefixed" row inside the join with the original PK or the row. We can use this to avoid hashing until we see > one item for both the join key and the prefix (the row PK).

This PR implements a new index that keeps track from keys and prefixes (when available) and entirely removes the hashing on the initial load (we don't send duplicate rows for a PK). It then uses the hashing only during the incrimental stage where this is significantly lower throughout.

before:

Dataset Size | Projects | Issues | Comments | Initial Load (ms) | Changes | Incremental (ms)
-------------|----------|-------|----------|-------------------|---------|------------------
Small        |       10 |     50 |       200 |              7.03 |      26 |             0.05
Medium       |       50 |    250 |      1000 |             20.83 |     130 |             0.10
Large        |      250 |   1250 |      5000 |             92.48 |     650 |             0.30
Very Large   |     1250 |   6250 |     25000 |            523.47 |    3250 |             1.42
Huge         |     6250 |  31250 |    125000 |           2867.57 |   16250 |            12.53

after:

Dataset Size | Projects | Issues | Comments | Initial Load (ms) | Changes | Incremental (ms)
-------------|----------|-------|----------|-------------------|---------|------------------
Small        |       10 |     50 |       200 |              3.83 |      26 |             0.05
Medium       |       50 |    250 |      1000 |              9.37 |     130 |             0.10
Large        |      250 |   1250 |      5000 |             36.30 |     650 |             0.86
Very Large   |     1250 |   6250 |     25000 |            181.34 |    3250 |             1.47
Huge         |     6250 |  31250 |    125000 |           1068.41 |   16250 |            15.50

Implementation notes:

I combined the previous ValueIndex and HashIndex into a single index as there would have been duplication to add the prefixes to both, this means there is a clear progression for a key starting with a single value, moving to multiple with a prefix and then on to multiple for a prefix using hashing.
The HashIndex handled the multiple-values -> single-value transition and moved the value back to the single value index. I have left that out of this new version for now as it seems like a rare edge case that only happens during incremental changes. I think the complexity it would bring to the code is not balanced but the performance impact it would have.

changeset-bot · 2025-09-13T16:54:18Z

🦋 Changeset detected

Latest commit: c0674b1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages

Name	Type
@tanstack/db-ivm	Patch
@tanstack/db	Patch
@tanstack/angular-db	Patch
@tanstack/electric-db-collection	Patch
@tanstack/query-db-collection	Patch
@tanstack/react-db	Patch
@tanstack/rxdb-db-collection	Patch
@tanstack/solid-db	Patch
@tanstack/svelte-db	Patch
@tanstack/trailbase-db-collection	Patch
@tanstack/vue-db	Patch
todos	Patch
@tanstack/db-example-react-todo	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

pkg-pr-new · 2025-09-13T16:55:52Z

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@549

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@549

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@549

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@549

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@549

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@549

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@549

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@549

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@549

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@549

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@549

commit: c0674b1

github-actions · 2025-09-13T16:57:24Z

Size Change: 0 B

Total Size: 68.4 kB

ℹ️ View Unchanged

Filename	Size
`./packages/db/dist/esm/change-events.js`	1.13 kB
`./packages/db/dist/esm/collection-events.js`	672 B
`./packages/db/dist/esm/collection.js`	10.9 kB
`./packages/db/dist/esm/deferred.js`	230 B
`./packages/db/dist/esm/errors.js`	3.1 kB
`./packages/db/dist/esm/index.js`	1.55 kB
`./packages/db/dist/esm/indexes/auto-index.js`	745 B
`./packages/db/dist/esm/indexes/base-index.js`	605 B
`./packages/db/dist/esm/indexes/btree-index.js`	1.74 kB
`./packages/db/dist/esm/indexes/lazy-index.js`	1.25 kB
`./packages/db/dist/esm/local-only.js`	827 B
`./packages/db/dist/esm/local-storage.js`	2.02 kB
`./packages/db/dist/esm/optimistic-action.js`	294 B
`./packages/db/dist/esm/proxy.js`	3.87 kB
`./packages/db/dist/esm/query/builder/functions.js`	615 B
`./packages/db/dist/esm/query/builder/index.js`	3.93 kB
`./packages/db/dist/esm/query/builder/ref-proxy.js`	938 B
`./packages/db/dist/esm/query/compiler/evaluators.js`	1.52 kB
`./packages/db/dist/esm/query/compiler/expressions.js`	631 B
`./packages/db/dist/esm/query/compiler/group-by.js`	2.08 kB
`./packages/db/dist/esm/query/compiler/index.js`	2.27 kB
`./packages/db/dist/esm/query/compiler/joins.js`	2.52 kB
`./packages/db/dist/esm/query/compiler/order-by.js`	1.23 kB
`./packages/db/dist/esm/query/compiler/select.js`	1.28 kB
`./packages/db/dist/esm/query/ir.js`	508 B
`./packages/db/dist/esm/query/live-query-collection.js`	333 B
`./packages/db/dist/esm/query/live/collection-config-builder.js`	2.59 kB
`./packages/db/dist/esm/query/live/collection-subscriber.js`	2.4 kB
`./packages/db/dist/esm/query/optimizer.js`	3.05 kB
`./packages/db/dist/esm/SortedMap.js`	1.24 kB
`./packages/db/dist/esm/transactions.js`	3.03 kB
`./packages/db/dist/esm/utils.js`	943 B
`./packages/db/dist/esm/utils/btree.js`	6.02 kB
`./packages/db/dist/esm/utils/comparison.js`	718 B
`./packages/db/dist/esm/utils/index-optimization.js`	1.62 kB

_{compressed-size-action::db-package-size}

github-actions · 2025-09-13T16:58:56Z

Size Change: 0 B

Total Size: 1.44 kB

ℹ️ View Unchanged

Filename	Size
`./packages/react-db/dist/esm/index.js`	152 B
`./packages/react-db/dist/esm/useLiveQuery.js`	1.28 kB

_{compressed-size-action::react-db-package-size}

KyleAMathews · 2025-09-13T18:05:52Z

Huge improvement!

kevin-dp · 2025-09-15T07:28:14Z

I don't like that we're special casing the index to handle a specific structure (the prefix keys) that occurs in joins. The indexes already handle keyed streams, so could we move the row's PK into the stream's key instead? What i mean is returning this from the map operator:

    map(([currentKey, namespacedRow]) => {
      // Extract the join key from the main table expression
      const mainKey = compiledMainExpr(namespacedRow)

      return [[mainKey, currentKey], namespacedRow]
    })

Or, if the index isn't good at handling a tuple like that, we could create a single string key:

return [`${mainKey}-${currentKey}`, namespacedRow]

That way we don't need to special case the index implementation.

samwillis · 2025-09-15T08:03:54Z

The row PK is not the join key, the index is on the join key. This prefix is essentially something that's extracted from the row that uniquely identifies it, and allows us to skip the expensive structural hashing unless there are multiple prefixes for the same key.
Putting the prefix in the main key (join key) would break joins.

kevin-dp

For the sake of time i'm fine going forward with this implementation but could we keep the old implementation and use this implementation only for the joins? I think it makes sense to have a specialised prefix index implementation for joins and keep the old index implementation for operators that don't need the prefix. That means we can also slightly simplify this prefix implementation since the PrefixMap will always have a TPrefix (so no need for the NO_PREFIX).

samwillis · 2025-09-15T12:00:19Z

That means we can also slightly simplify this prefix implementation since the PrefixMap will always have a TPrefix (so no need for the NO_PREFIX).

Unfortunately we can't as someone could still use the db-ivm package without the prefixed rows going into the join.

I don't like the idea of having to maintain - and add tests for - multiple different index implementations. I would much prefer to have one that worked well until we found a real need to have multiple.

impliment prefixed index to remove hashing from joins initial load

4f1dcbc

Merge branch 'main' into samwillis/ivm-prefix-index

7cd6701

samwillis added 2 commits September 13, 2025 19:24

comments

a6b2ea9

changeset

d583ced

samwillis marked this pull request as ready for review September 13, 2025 18:32

samwillis requested a review from kevin-dp September 13, 2025 18:32

kevin-dp approved these changes Sep 15, 2025

View reviewed changes

samwillis added 2 commits September 18, 2025 12:57

Allow ValueMap for a key without a PrefixMap

0b2785c

refactor

c0674b1

samwillis mentioned this pull request Sep 19, 2025

refactor joins to use direct implementation or each type rather than composition of inner+anti joins #571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement prefixed index to remove hashing from joins initial load #549

Implement prefixed index to remove hashing from joins initial load #549

Uh oh!

samwillis commented Sep 13, 2025 •

edited

Loading

Uh oh!

changeset-bot bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

KyleAMathews commented Sep 13, 2025

Uh oh!

kevin-dp commented Sep 15, 2025 •

edited

Loading

Uh oh!

samwillis commented Sep 15, 2025 •

edited

Loading

Uh oh!

kevin-dp left a comment •

edited

Loading

Uh oh!

samwillis commented Sep 15, 2025

Uh oh!

Uh oh!

Implement prefixed index to remove hashing from joins initial load #549

Are you sure you want to change the base?

Implement prefixed index to remove hashing from joins initial load #549

Uh oh!

Conversation

samwillis commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation notes:

Uh oh!

changeset-bot bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

pkg-pr-new bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KyleAMathews commented Sep 13, 2025

Uh oh!

kevin-dp commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

samwillis commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kevin-dp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

samwillis commented Sep 15, 2025

Uh oh!

Uh oh!

samwillis commented Sep 13, 2025 •

edited

Loading

changeset-bot bot commented Sep 13, 2025 •

edited

Loading

pkg-pr-new bot commented Sep 13, 2025 •

edited

Loading

github-actions bot commented Sep 13, 2025 •

edited

Loading

github-actions bot commented Sep 13, 2025 •

edited

Loading

kevin-dp commented Sep 15, 2025 •

edited

Loading

samwillis commented Sep 15, 2025 •

edited

Loading

kevin-dp left a comment •

edited

Loading