Skip to content

Conversation

samwillis
Copy link
Collaborator

@samwillis samwillis commented Sep 13, 2025

Currently the ivm index system tries to avoid hashing values until its seen > one item for a specific key, this removes all hashing on initial load on one side of most joins, but we still end up hashing a lot of values of the other. Think of joining comments to issues, a many to one join, we end up hashing the comments.

When performing joins DB does this map just before the join:

    map(([currentKey, namespacedRow]) => {
      // Extract the join key from the main table expression
      const mainKey = compiledMainExpr(namespacedRow)

      // Return [joinKey, [originalKey, namespacedRow]]
      return [mainKey, [currentKey, namespacedRow]]
    })

this means we see a "prefixed" row inside the join with the original PK or the row. We can use this to avoid hashing until we see > one item for both the join key and the prefix (the row PK).

This PR implements a new index that keeps track from keys and prefixes (when available) and entirely removes the hashing on the initial load (we don't send duplicate rows for a PK). It then uses the hashing only during the incrimental stage where this is significantly lower throughout.

before:

Dataset Size | Projects | Issues | Comments | Initial Load (ms) | Changes | Incremental (ms)
-------------|----------|-------|----------|-------------------|---------|------------------
Small        |       10 |     50 |       200 |              7.03 |      26 |             0.05
Medium       |       50 |    250 |      1000 |             20.83 |     130 |             0.10
Large        |      250 |   1250 |      5000 |             92.48 |     650 |             0.30
Very Large   |     1250 |   6250 |     25000 |            523.47 |    3250 |             1.42
Huge         |     6250 |  31250 |    125000 |           2867.57 |   16250 |            12.53

after:

Dataset Size | Projects | Issues | Comments | Initial Load (ms) | Changes | Incremental (ms)
-------------|----------|-------|----------|-------------------|---------|------------------
Small        |       10 |     50 |       200 |              3.83 |      26 |             0.05
Medium       |       50 |    250 |      1000 |              9.37 |     130 |             0.10
Large        |      250 |   1250 |      5000 |             36.30 |     650 |             0.86
Very Large   |     1250 |   6250 |     25000 |            181.34 |    3250 |             1.47
Huge         |     6250 |  31250 |    125000 |           1068.41 |   16250 |            15.50

Implementation notes:

  • I combined the previous ValueIndex and HashIndex into a single index as there would have been duplication to add the prefixes to both, this means there is a clear progression for a key starting with a single value, moving to multiple with a prefix and then on to multiple for a prefix using hashing.
  • The HashIndex handled the multiple-values -> single-value transition and moved the value back to the single value index. I have left that out of this new version for now as it seems like a rare edge case that only happens during incremental changes. I think the complexity it would bring to the code is not balanced but the performance impact it would have.

Copy link

changeset-bot bot commented Sep 13, 2025

🦋 Changeset detected

Latest commit: c0674b1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 13 packages
Name Type
@tanstack/db-ivm Patch
@tanstack/db Patch
@tanstack/angular-db Patch
@tanstack/electric-db-collection Patch
@tanstack/query-db-collection Patch
@tanstack/react-db Patch
@tanstack/rxdb-db-collection Patch
@tanstack/solid-db Patch
@tanstack/svelte-db Patch
@tanstack/trailbase-db-collection Patch
@tanstack/vue-db Patch
todos Patch
@tanstack/db-example-react-todo Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link

pkg-pr-new bot commented Sep 13, 2025

More templates

@tanstack/angular-db

npm i https://pkg.pr.new/@tanstack/angular-db@549

@tanstack/db

npm i https://pkg.pr.new/@tanstack/db@549

@tanstack/db-ivm

npm i https://pkg.pr.new/@tanstack/db-ivm@549

@tanstack/electric-db-collection

npm i https://pkg.pr.new/@tanstack/electric-db-collection@549

@tanstack/query-db-collection

npm i https://pkg.pr.new/@tanstack/query-db-collection@549

@tanstack/react-db

npm i https://pkg.pr.new/@tanstack/react-db@549

@tanstack/rxdb-db-collection

npm i https://pkg.pr.new/@tanstack/rxdb-db-collection@549

@tanstack/solid-db

npm i https://pkg.pr.new/@tanstack/solid-db@549

@tanstack/svelte-db

npm i https://pkg.pr.new/@tanstack/svelte-db@549

@tanstack/trailbase-db-collection

npm i https://pkg.pr.new/@tanstack/trailbase-db-collection@549

@tanstack/vue-db

npm i https://pkg.pr.new/@tanstack/vue-db@549

commit: c0674b1

Copy link
Contributor

github-actions bot commented Sep 13, 2025

Size Change: 0 B

Total Size: 68.4 kB

ℹ️ View Unchanged
Filename Size
./packages/db/dist/esm/change-events.js 1.13 kB
./packages/db/dist/esm/collection-events.js 672 B
./packages/db/dist/esm/collection.js 10.9 kB
./packages/db/dist/esm/deferred.js 230 B
./packages/db/dist/esm/errors.js 3.1 kB
./packages/db/dist/esm/index.js 1.55 kB
./packages/db/dist/esm/indexes/auto-index.js 745 B
./packages/db/dist/esm/indexes/base-index.js 605 B
./packages/db/dist/esm/indexes/btree-index.js 1.74 kB
./packages/db/dist/esm/indexes/lazy-index.js 1.25 kB
./packages/db/dist/esm/local-only.js 827 B
./packages/db/dist/esm/local-storage.js 2.02 kB
./packages/db/dist/esm/optimistic-action.js 294 B
./packages/db/dist/esm/proxy.js 3.87 kB
./packages/db/dist/esm/query/builder/functions.js 615 B
./packages/db/dist/esm/query/builder/index.js 3.93 kB
./packages/db/dist/esm/query/builder/ref-proxy.js 938 B
./packages/db/dist/esm/query/compiler/evaluators.js 1.52 kB
./packages/db/dist/esm/query/compiler/expressions.js 631 B
./packages/db/dist/esm/query/compiler/group-by.js 2.08 kB
./packages/db/dist/esm/query/compiler/index.js 2.27 kB
./packages/db/dist/esm/query/compiler/joins.js 2.52 kB
./packages/db/dist/esm/query/compiler/order-by.js 1.23 kB
./packages/db/dist/esm/query/compiler/select.js 1.28 kB
./packages/db/dist/esm/query/ir.js 508 B
./packages/db/dist/esm/query/live-query-collection.js 333 B
./packages/db/dist/esm/query/live/collection-config-builder.js 2.59 kB
./packages/db/dist/esm/query/live/collection-subscriber.js 2.4 kB
./packages/db/dist/esm/query/optimizer.js 3.05 kB
./packages/db/dist/esm/SortedMap.js 1.24 kB
./packages/db/dist/esm/transactions.js 3.03 kB
./packages/db/dist/esm/utils.js 943 B
./packages/db/dist/esm/utils/btree.js 6.02 kB
./packages/db/dist/esm/utils/comparison.js 718 B
./packages/db/dist/esm/utils/index-optimization.js 1.62 kB

compressed-size-action::db-package-size

Copy link
Contributor

github-actions bot commented Sep 13, 2025

Size Change: 0 B

Total Size: 1.44 kB

ℹ️ View Unchanged
Filename Size
./packages/react-db/dist/esm/index.js 152 B
./packages/react-db/dist/esm/useLiveQuery.js 1.28 kB

compressed-size-action::react-db-package-size

@KyleAMathews
Copy link
Collaborator

Huge improvement!

@samwillis samwillis marked this pull request as ready for review September 13, 2025 18:32
@samwillis samwillis requested a review from kevin-dp September 13, 2025 18:32
@kevin-dp
Copy link
Contributor

kevin-dp commented Sep 15, 2025

I don't like that we're special casing the index to handle a specific structure (the prefix keys) that occurs in joins. The indexes already handle keyed streams, so could we move the row's PK into the stream's key instead? What i mean is returning this from the map operator:

    map(([currentKey, namespacedRow]) => {
      // Extract the join key from the main table expression
      const mainKey = compiledMainExpr(namespacedRow)

      return [[mainKey, currentKey], namespacedRow]
    })

Or, if the index isn't good at handling a tuple like that, we could create a single string key:

return [`${mainKey}-${currentKey}`, namespacedRow]

That way we don't need to special case the index implementation.

@samwillis
Copy link
Collaborator Author

samwillis commented Sep 15, 2025

The row PK is not the join key, the index is on the join key. This prefix is essentially something that's extracted from the row that uniquely identifies it, and allows us to skip the expensive structural hashing unless there are multiple prefixes for the same key.
Putting the prefix in the main key (join key) would break joins.

Copy link
Contributor

@kevin-dp kevin-dp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of time i'm fine going forward with this implementation but could we keep the old implementation and use this implementation only for the joins? I think it makes sense to have a specialised prefix index implementation for joins and keep the old index implementation for operators that don't need the prefix. That means we can also slightly simplify this prefix implementation since the PrefixMap will always have a TPrefix (so no need for the NO_PREFIX).

@samwillis
Copy link
Collaborator Author

That means we can also slightly simplify this prefix implementation since the PrefixMap will always have a TPrefix (so no need for the NO_PREFIX).

Unfortunately we can't as someone could still use the db-ivm package without the prefixed rows going into the join.

I don't like the idea of having to maintain - and add tests for - multiple different index implementations. I would much prefer to have one that worked well until we found a real need to have multiple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants