fix: hierarchical document by review#159

Closed

cutecutecat wants to merge 1 commit intotensorchord:mainfrom

cutecutecat:fix-hierarchical

Member

cutecutecat commented Jan 4, 2026 •

edited

Loading

Fix by most comments in #158


          fix: hierarchical document by review

bcf7704

Signed-off-by: cutecutecat <junyuchen@tensorchord.ai>

vercel bot commented Jan 4, 2026 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
pgvecto-rs-docs	Ready	Preview, Comment	Jan 4, 2026 2:16am

vercel bot deployed to Preview

January 4, 2026 02:16

View deployment

cutecutecat mentioned this pull request

add indexing guide for memory and time saving #158

Merged

cutecutecat requested a review from Copilot

January 4, 2026 02:28

cutecutecat marked this pull request as ready for review

January 4, 2026 02:28

Copilot started reviewing on behalf of cutecutecat

January 4, 2026 02:28

This comment was marked as resolved.

Sign in to view

cutecutecat requested a review from usamoi

January 4, 2026 02:50

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md

-              If the build speed is still unsatisfactory, you can use the hierarchical clustering to accelerate the process at the expense of some accuracy. In our [benchmark](https://blog.vectorchord.ai/how-we-made-100m-vector-indexing-in-20-minutes-possible-on-postgresql#heading-hierarchical-k-means), the hierarchical clustering was 100 times faster than the default algorithm, while query accuracy decreased by less than 1%.
+              For large tables with more than 50 million rows, the `build.internal` process requires significant time and memory. Let $D$ be the vector dimension used for partition, $C$ be `build.internal.lists[-1]`, $F$ be `build.internal.sampling_factor`, $L$ be `build.internal.kmeans_iterations`, and $T$ be `build.internal.build_threads`. The build time is approximately $O(FC^2DL)$, which usually takes more than one day.
+              If this applies to you, you can use the hierarchical clustering to speed up the process, albeit at the expense of some accuracy. In our [benchmark](https://blog.vectorchord.ai/how-we-made-100m-vector-indexing-in-20-minutes-possible-on-postgresql#heading-hierarchical-k-means), hierarchical clustering was 100 times faster than the default algorithm, while query recall decreased only from 95.6% to 94.9%.

Collaborator

usamoi Jan 4, 2026

What's If this applies to you,? Not 100 times, should be 400 times.

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md

-              ---
+              ## Tuning: Optimize the memory usage with indexing
+              When the indexing process starts, VectorChord shows the estimated amount of memory that will be allocated, such as:

Collaborator

usamoi Jan 4, 2026

It does not specify where this will be displayed. In addition, due to settings, users may not see this message at all.

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md


		When the indexing process starts, VectorChord shows the estimated amount of memory that will be allocated, such as:

		```shell

Collaborator

usamoi Jan 4, 2026

Why shell?

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md

+              INFO:  clustering: estimated memory usage is 1.49 GiB
+              ```
+              If the value exceeds your expectations or the physical memory constraint, it is wise to cancel and check this chapter. There are some options that can help reduce memory usage.

Collaborator

usamoi Jan 4, 2026

How to cancel?

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md Show resolved Hide resolved

usamoi reviewed

View reviewed changes

src/vectorchord/usage/indexing.md

+              * C: `build.internal.lists[-1]`.
-              If you encounter an Out-of-Memory (OOM) error, reducing $D$, $C$ or $F$ will lower the memory usage. Based on our [experience](https://blog.vectorchord.ai/how-we-made-100m-vector-indexing-in-20-minutes-possible-on-postgresql#heading-dimensionality-reduction), reducing `D` will have the least impact on accuracy, so that could be a good starting point. Decreasing `F` is also plausible. Since `C` is much more sensitive, it should be the last thing you consider.
+              Based on our [experience](https://blog.vectorchord.ai/how-we-made-100m-vector-indexing-in-20-minutes-possible-on-postgresql#heading-dimensionality-reduction), reducing `D` will have the least impact on accuracy, so that could be a good starting point. Decreasing `F` is also plausible. Since `C` is much more sensitive, it should be the last thing you consider.

Collaborator

usamoi Jan 4, 2026

This is highly suspicious.

cutecutecat closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet