Skip to content

Commit 46b407d

Browse files
authored
add indexing guide for memory and time saving (#156)
When to use: - kmeans_dimension - hierarchical clustering Signed-off-by: cutecutecat <junyuchen@tensorchord.ai>
1 parent d6e531a commit 46b407d

File tree

1 file changed

+51
-15
lines changed

1 file changed

+51
-15
lines changed

src/vectorchord/usage/indexing.md

Lines changed: 51 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ You can also add filters to vector search queries as needed.
2727
SELECT * FROM items WHERE id % 7 <> 0 ORDER BY embedding <-> '[3,1,2]' LIMIT 10;
2828
```
2929

30-
## Tuning
30+
## Tuning: Balance query throughput and accuracy
3131

3232
When there are less than $100,000$ rows in the table, you usually don't need to set parameters for search and query.
3333

@@ -58,22 +58,19 @@ The parameter `lists` should be tuned based on the number of rows. The following
5858
| $N \in [2 \times 10^6, 5 \times 10^7)$ | $L \in [4 \sqrt{N}, 8 \sqrt{N}]$ | `[10000]` |
5959
| $N \in [5 \times 10^7, \infty)$ | $L \in [8 \sqrt{N}, 16\sqrt{N}]$ | `[80000]` |
6060

61-
The process of building an index involves two steps: partitioning the vector space first, and then inserting rows into the index. The first step, partitioning the vector space, can be sped up using multiple threads.
61+
The process of building an index involves two steps: clustering the vectors first, and then inserting vectors into the index. The first step, clustering the vectors, can be sped up using multiple threads.
6262

6363
```sql
6464
CREATE INDEX ON items USING vchordrq (embedding vector_l2_ops) WITH (options = $$
6565
[build.internal]
6666
lists = [1000]
6767
build_threads = 8
6868
$$);
69-
70-
SET vchordrq.probes TO '10';
71-
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 10;
7269
```
7370

74-
The second step, inserting rows, can be parallelized using multiple processes. Refer to [PostgreSQL Tuning](performance-tuning.md).
71+
The second step, inserting vectors into the index, can be parallelized using the appropriate GUC parameter. Refer to [PostgreSQL Tuning](performance-tuning.md).
7572

76-
For most datasets using cosine similarity, enabling `residual_quantization` and `build.internal.spherical_centroids` improves both QPS and recall.
73+
For most datasets using cosine similarity, enabling `residual_quantization` and `build.internal.spherical_centroids` may improve both QPS and recall. If possible, please verify this on data from the production environment.
7774

7875
```sql
7976
CREATE INDEX ON items USING vchordrq (embedding vector_cosine_ops) WITH (options = $$
@@ -83,27 +80,66 @@ lists = [1000]
8380
spherical_centroids = true
8481
build_threads = 8
8582
$$);
83+
```
8684

87-
SET vchordrq.probes TO '10';
88-
SELECT * FROM items ORDER BY embedding <=> '[3,1,2]' LIMIT 10;
85+
## Tuning: Improve build speed
86+
87+
For large tables (> 50 million rows), the `build.internal` process requires significant time and memory. Let the vector dimension be $D$, `build.internal.lists[-1]` be $C$, `build.internal.sampling_factor` be $F$, `build.internal.kmeans_iterations` be $L$, and `build.internal.build_threads` be $T$.
88+
89+
* The memory consumption is approximately $4CD(F + T + 1)$ bytes, which usually takes more than 128 GB.
90+
* The build time is approximately $O(FC^2DL)$, which usually takes more than one day.
91+
92+
To improve the build speed, you may opt to use more shared memory to accelerate the process by setting `build.pin` to `2`.
93+
94+
```sql
95+
CREATE INDEX ON items USING vchordrq (embedding vector_l2_ops) WITH (options = $$
96+
build.pin = 2
97+
[build.internal]
98+
lists = [160000]
99+
build_threads = 8
100+
$$);
89101
```
90102

91-
For large tables, you may opt to use more shared memory to accelerate the process by setting `build.pin` to `2`.
103+
If the build speed is still unsatisfactory, you can use Hierarchical clustering to accelerate the process at the expense of some accuracy. In our benchmark, the Hierarchical clustering was 100 times faster than the default Lloyd clustering, while query accuracy decreased by less than 1%.
92104

93105
```sql
94106
CREATE INDEX ON items USING vchordrq (embedding vector_l2_ops) WITH (options = $$
95-
residual_quantization = true
96107
build.pin = 2
97108
[build.internal]
98-
lists = [1000]
99-
spherical_centroids = true
109+
lists = [160000]
100110
build_threads = 8
111+
kmeans_algorithm.hierarchical = {}
101112
$$);
102113
```
103114

104-
For large tables, the `build.internal` process costs significant time and memory. Let `build.internal.kmeans_dimension` or the dimension be $D$, `build.internal.lists[-1]` be $C$, `build.internal.sampling_factor` be $F$, and `build.internal.build_threads` be $T$. The memory consumption is approximately $4CD(F + T + 1)$ bytes. You can moderately reduce these options for lower memory usage.
115+
## Tuning: Save more memory
116+
117+
As we discussed, these parameters determine the memory usage during the index build $4CD(F + T + 1)$:
118+
119+
* D: Vector dimension, or `build.internal.kmeans_dimension` if set
120+
* C: `build.internal.lists[-1]`
121+
* F: `build.internal.sampling_factor`
122+
* T: `build.internal.build_threads`
123+
124+
If you encounter an Out-of-Memory (OOM) error, reducing these parameters will lower memory usage. Based on our experience, reducing D will have the least impact on accuracy, so that could be a good starting point. Decreasing `F` is also plausible next. Since `C` is much more sensitive, it should be the last thing you consider.
125+
126+
For your reference, this configuration has little impact on query accuracy (less than 1%):
127+
* Reduce `D` from 768 to 100
128+
* Reduce `F` from 256 to 64
129+
130+
```sql
131+
CREATE INDEX ON items USING vchordrq (embedding vector_l2_ops) WITH (options = $$
132+
build.pin = 2
133+
[build.internal]
134+
lists = [160000]
135+
build_threads = 8
136+
kmeans_algorithm.hierarchical = {}
137+
kmeans_dimension = 100
138+
sampling_factor = 64
139+
$$);
140+
```
105141

106-
You can also refer to [External Build](external-index-precomputation) to offload the indexing workload to other machines.
142+
If the decrease in accuracy is unacceptable, you can also refer to [External Build](external-index-precomputation) to offload the indexing workload to other machines.
107143

108144
## Reference
109145

0 commit comments

Comments
 (0)