GitHub - kysely/jazz-vector: Local-first vector similarity search for Jazz Tools

Get started • Try the demo • Example • API • Advanced

Local-first vector similarity search for Jazz

Store and query high-dimensional vectors directly on-device.

Built on Jazz, you get:

Local-first sync across devices
End-to-end encryption
Real-time multiplayer

When paired with a local embeddings model:

Works fully offline

👉 Try the live demo (source)

Watch the intro , ~5 min

// -- schema.ts --
export const JournalEntry = co.map({
  text: z.string(),
  simpleEmbedding: coV.vector(384), // <--- Define CoVector
});

// -- app.tsx --
const { search, isSearching } = useCoVectorSearch(
  journalEntries,
  (entry) => entry.simpleEmbedding,
  queryEmbedding
);

Use cases

Semantic search: Find notes, docs, or messages by meaning
Personalization: On-device recommendations and adaptive UIs
Knowledge management: Organize personal wikis, journals, or research by concept rather than keyword
Information matching: Connect datasets or peers through embeddings
Context-aware assistants: Build local-first AI helpers that understand the user’s data while keeping it private
Cross-device continuity: Carry embeddings seamlessly across phone, tablet, and desktop without a cloud backend
Creative apps: Enable music, art, or writing tools that find related ideas, motifs, or inspirations

More cool use cases, as per AI

Search & organization

Semantic search: Find notes, docs, emails, or messages by meaning
Smart tagging & clustering: Auto-group related items and generate topic labels
Near-duplicate detection: Merge similar notes, photos, or files
Cross-app search: Index clipboard, screenshots, and files for one-place recall

Personalization & recommendations

On-device recommendations: Rank feeds, reading lists, or media without sending data to the cloud
Context-aware shortcuts: Suggest next actions based on what the user is doing
Session re-ranking: Personalize command palettes, search results, or menus

Retrieval for AI (local RAG)

Context fetching for LLMs: Retrieve relevant chunks from local docs to ground responses
Conversation memory: Pull past chats or notes that match the current topic
Snippet linking: Auto-link related passages across notebooks or PDFs

Media & sensors

Photo & screenshot search: “Find images with whiteboard notes from last week”
Audio similarity: Locate related voice notes, music snippets, or sound effects

Collaboration & P2P

Peer-to-peer matching: Align embeddings between devices to find shared interests or files
Team knowledge linking: Connect related docs across teammates without centralizing raw data
Federated discovery: Share pointers/IDs instead of content; keep source data private

Productivity & dev workflows

Code search: Semantic lookup of functions, symbols, and snippets in local repos
Issue triage: Match new bugs to similar past reports or fixes
Research assistants: Cluster and surface related papers, highlights, and annotations

Safety & housekeeping

Content filtering: On-device NSFW/spam heuristics using similarity
Anomaly detection: Spot outliers in logs or metrics locally
Storage hygiene: Identify stale or redundant items to archive

Installation

Install the package from npm:

npm i jazz-vector

Requires jazz-tools (minimum 0.17.4) to be already installed.

Embeddings (Bring Your Own)

Jazz Vector only deals with storage and search. You generate the vectors with any model you like (OpenAI, Hugging Face, or custom), then feed the vectors in.

On-device option (recommended): Use Transformers.js to run models locally for offline, private embedding:

Xenova/all-MiniLM-L6-v2 — 384-dim, ~23 MB
More models →

Alternatively, you can call a server-side model (your own or commercial one like OpenAI), but note this removes offline support and may affect user privacy.

Usage

Define your schema

Jazz Vector exposes new CoVector value type you should use to define vector embeddings. It expects the number of dimensions of your embeddings.

export const Embedding = coV.vector(384);

Currently, you can perform a vector search only across a CoList of CoMaps containing embeddings property. For other data structures, see “Manual Index” pattern.

// schema.ts
import { co, z } from "jazz-tools";
import { coV } from "jazz-vector";

// 1) Define an embedding vector schema with expected dimension count
export const Embedding = coV.vector(384);

export const JournalEntry = co.map({
  text: z.string(),

  // 2) Use an embedding schema inside an entity
  embedding: Embedding,
});

// 3) Define a searchable CoList of items containing embeddings property
export const JournalEntryList = co.list(JournalEntry);

Since CoVector is a simple wrapper around Jazz's built-in FileStream, all the FileStream patterns apply (permissions, loading, etc).

Create & index the data

It is recommended to obtain the embeddings vector at the time of writing a CoValue. This makes the most sense, because:

writer naturally owns the data
new CoValue will be automatically indexed for all subsequent reader peers

Alternatively, if you wish to create embeddings in the server worker after creation, it will be automatically synced by the power of Jazz.

Instantiate a CoVector using createFrom method

await Embedding.createFrom([0.018676748499274254, -0.06785402446985245,...])

The instance of a CoVector can be assigned as a value as expected by the schema.

// create.ts
import { JournalEntry, Embedding } from "./schema.ts";
import { createEmbedding } from "./your-code";

// 1) Generate embeddings (bring your own embeddings model)
const vector: number[] = await createEmbedding("Text");

const journalEntry = JournalEntry.create({
  text: "Text",

  // 2) Instantiate and assign a `CoVector` from a specific vector (`number[]`)
  embedding: await Embedding.createFrom(vector),
});

journalEntries.push(journalEntry);

Use semantic search

The vector search is performed locally in memory on top of Jazz's CoList.

As such, you need to first load the CoList you wish to search across manually.

// app.tsx
import { useCoState } from "jazz-tools/react";
import { useCoVectorSearch } from "jazz-vector/react";

import { JournalEntryList } from "./schema.ts";

// 1) Load a searchable list (that has elements containing embeddings)
const journalEntries = useCoState(
  JournalEntryList,
  me.root.journalEntries.id,
  { resolve: { $each: true } }
);

Then, pass the searchable list along with:

getter for embedding vector property on the list item
embedding vector for the search query (or null that will pass your list through)

// 2) Search the list
const { search, isSearching } = useCoVectorSearch(
  journalEntries,             // <- loaded list to search in
  (entry) => entry.embedding, // <- embedding property getter on each list item
  queryEmbedding              // <- embeddings of search query (number[]), or null to pass through
);

You can filter the data before passing it to CoVectorSearch to search on a subset of your list.

There are 2 search functions available:

useCoVectorSearch hook for React apps
searchCoVector function for server workers or vanilla JS

Patterns

Manual “index”

Currently, vector search works only across a CoList of CoMaps containing embeddings property. To search data stored in a different data structures (or across multiple ones), you'll need to construct and maintain a searchable list manually.

For example, given you have a recursive Block schema.

// -- schema.ts
import { co, z } from "jazz-tools";

// Recursive data structure
const Block = co.map({
  text: z.string(),
  get childBlock() {
    return Block.optional();
  }
  get parentBlock() {
    return Block.optional();
  }
});

You can construct a simplified list of searchable objects that hold the embedding vector and a reference to the original Block instance.

// -- schema.ts
import { co, z } from "jazz-tools";
import { coV } from "jazz-vector";

const Block = co.map({ ... });

// Simple embedding + reference
export const SearchableBlock = co.map({
  block: Block,
  embedding: coV.vector(1536),
});

// Flat searchable list of references with embeddings
export const BlocksIndex = co.list(SearchableBlock);

// -- query.tsx
const { search, isSearching } = useCoVectorSearch(
  searchableBlocksList,
  (block) => block.embedding,
  queryEmbedding
);

// `search.results` returns results over `SearchableBlock`
search.results.map(searchResult => {
  const searchableBlock = searchResult.value
  const block = searchableBlock.block // derefs and loads the `Block` instance
})

This pattern of manually constructing a single “index” is also useful for searching across various data types inside your app (e.g. notes, photos, messages)

Server-side embedding model

The lib expects you to bring own embeddings, so you're free to use either local or server-side model.

Using a server-side embedding model makes sense when (for example)

you want to optimize client app package size
you want to offload client CPU cycles when creating embeddings for huge amounts of data
you want larger, specialized, or proprietary models
you want easier centralized upgrades.

The trade-offs are:

loss of offline capability
higher latency and failure modes (network/timeouts)
per-request cost/rate limits
privacy implications because user text leaves the device.

Dual embeddings

You can put embedding vectors of various dimensions on a single CoValue.

This allows you to use different embedding models for search tasks of varying difficulties, for example:

use small simple embeddings models on the client to power the on-device search feature
use powerful commercial embeddings models on the server for RAG

// schema.ts
export const JournalEntry = co.map({
  text: z.string(),
  simpleEmbedding: coV.vector(384),
  largeEmbedding: coV.vector(3072),
});

The CoVectorSearch dereferences and loads the actual CoVector value (the embedding vector) only upon search.

// query.tsx (on the client device)
const { queryEmbeddings } = useSimpleEmbeddings(...) // returns 384-dimensional vector

const { search, isSearching } = useCoVectorSearch(
  journalEntries,
  (entry) => entry.simpleEmbedding,
  queryEmbeddings
);

// search.ts (on the server)
const queryEmbeddings = await openai.embeddings.create(...) // returns 3072-dimensional vector

const searchResults = searchCoVector(
  journalEntries,
  (entry) => entry.largeEmbedding,
  queryEmbeddings
);

API

`coV.vector()`

Defines a CoVector schema in the Jazz storage schema.

Parameters:

Parameter	Type	Description
`dimensions`	Number	The number of embedding vector dimensions (length)

Returns

CoVector schema (extension of Jazz's built-in FileStream schema with .createFrom method)

`CoVector.createFrom()`

Creates an instance of CoVector from CoVector schema.

Parameters:

Parameter	Type	Description
`vector`	Array of Number; or `Float32Array`	The raw vector data. Must have the exact dimension (length) as defined in the schema.
`options`	Jazz Ownership Object (see)	Native Jazz's ownership options

Returns

CoVector (Jazz FileStream)

`useCoVectorSearch()` (React only)

Performs a vector search on a CoList. React hook.

Automatically recalculates the results when the searched list or query changes.

Parameters:

Parameter	Type	Description
`list`	`CoList` or `undefined` or `null`	An instance of CoList to search in.
`embeddingGetter`	Function	Getter function for the embedding property on each list item.
`queryEmbeddings`	`number[]` or `Float32Array` or `null`	Embedding vector for the search query. When query is `null`, the entire list will be passed through.
`filterOptions`	`{ limit: N }` or `{ similarityThreshold: N }` or `{ similarityTopPercent: N}`	Controls how many results are returned. `limit` sets the maximum exact number of results; `similarityThreshold` filters by minimum similarity score; `similarityTopPercent` filters N% top percents based on the highest score. Default `{ limit: 10 }`

Returns

Parameter	Type	Description
`isSearching`	Boolean	Determines whether a search is currently pending.
`search`	`CoVectorSearchResult` (see details)	Search results.
`error`	String (optional)	Eventual error from the search

`searchCoVector()` (server or vanilla JS)

Performs a vector search on a CoList. Asynchronous function to be used in the server worker, or a vanilla JS code.

Parameters:

Parameter	Type	Description
`list`	`CoList` or `undefined` or `null`	An instance of CoList to search in.
`embeddingGetter`	Function	Getter function for the embedding property on each list item.
`queryEmbeddings`	`number[]` or `Float32Array` or `null`	Embedding vector for the search query. When query is `null`, the entire list will be passed through.
`options`	Object (optional)
`filterOptions`	`{ limit: N }` or `{ similarityThreshold: N }` or `{ similarityTopPercent: N}`	Controls how many results are returned. `limit` sets the maximum exact number of results; `similarityThreshold` filters by minimum similarity score; `similarityTopPercent` filters N% top percents based on the highest score. Default `{ limit: 10 }`
`abortSignal`	`AbortSignal`	Adds ability to abort the search

Returns:

CoVectorSearchResult (see details)

`CoVectorSearchResult` (type)

Result of the vector search call.

Has 3 variants based on input list:

undefined when input list is undefined
null when input list is null
Object (see below) when input list has data

Parameter	Type	Description
`didSearch`	Boolean	Determines whether the search was performed or not.
`durationMs`	Number (optional)	Duration of the vector search in milliseconds.
`results`	Array	Array of results, sorted by `similarity` from highest to lowest; or the original list data if query was `null`
`value`	CoList item type	The original item from the CoList
`similarity`	Number (optional) between `-1` and `1`	Similarity score of this value to the query. Will be present if the search was performed (input query was set)

When the input query is null the search will pass through all of the original data wrapped in CoVectorSearchResult type (with didSearch: false and results array without a similarity score).

Status & Roadmap

The current version is the first, most basic (even naive), unoptimized implementation of vector storage and search.

The search simply loads all vectors one by one, then calculates similarity scores, and sorts the results. The performance is poor (search across only 1500 (384-dim) vectors takes ~115 ms in Safari on M1 Pro.)

However, it is a fully working semantic search.

Next steps for this lib:

build a true vector index
- first milestone is to reach 100k vectors search within 100ms
performance optimizations for calculating similarity scores
see TODOs in code
build bindings for Svelte, etc (looking for contributors!)

Development

npm install

npm run build

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
docs		docs
examples/demo		examples/demo
src		src
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

👉 Try the live demo (source)

Watch the intro , ~5 min

Use cases

Installation

Embeddings (Bring Your Own)

Usage

Define your schema

Create & index the data

Use semantic search

Patterns

Manual “index”

Server-side embedding model

Dual embeddings

API

`coV.vector()`

Parameters:

Returns

`CoVector.createFrom()`

Parameters:

Returns

`useCoVectorSearch()` (React only)

Parameters:

Returns

`searchCoVector()` (server or vanilla JS)

Parameters:

Returns:

`CoVectorSearchResult` (type)

Status & Roadmap

Development

License

About

Uh oh!

Languages

License

kysely/jazz-vector

Folders and files

Latest commit

History

Repository files navigation

👉 Try the live demo (source)

Watch the intro , ~5 min

Use cases

Installation

Embeddings (Bring Your Own)

Usage

Define your schema

Create & index the data

Use semantic search

Patterns

Manual “index”

Server-side embedding model

Dual embeddings

API

coV.vector()

Parameters:

Returns

CoVector.createFrom()

Parameters:

Returns

useCoVectorSearch() (React only)

Parameters:

Returns

searchCoVector() (server or vanilla JS)

Parameters:

Returns:

CoVectorSearchResult (type)

Status & Roadmap

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages

`coV.vector()`

`CoVector.createFrom()`

`useCoVectorSearch()` (React only)

`searchCoVector()` (server or vanilla JS)

`CoVectorSearchResult` (type)