-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Disallow dot_product and max_inner_product for int8_hnsw GPU type #136881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disallow dot_product and max_inner_product for int8_hnsw GPU type #136881
Conversation
For int8_hnsw, during merges we get quantized vectors from Lucene files but dropping for each quantized vector its correction factor. For cosine and euclidean metrics this correction factor is not important, but for dot_product and max_inner_product metrics, they are important. It means that that currently for dot_product and max_inner_product metrics, GPU graph building doesn't work well, and may product bad recall. Thus this PR diallows these metrids. Alternatives: for most datasets (really majority), we can substitute dot_product with cosine. But there are some datasets that require max_inner_product, and for this our "int8_hsnw" will not work, and "hnsw" should be used instead
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
Hi @mayya-sharipova, I've created a changelog YAML for you. |
KnnIndexTester Notice low recall for dot_product on gpu, while recall recovers for cosine. int8 dot_product
int8 cosine
But for other datasets, there is no difference: hotpotqa-arctic: 5.2M docs; 768 dims; 8 indexing threads
|
} | ||
return new ES92GpuHnswVectorsFormat(hnswIndexOptions.m(), efConstruction); | ||
} else if (indexOptions.getType() == DenseVectorFieldMapper.VectorIndexType.INT8_HNSW) { | ||
if (similarity == DenseVectorFieldMapper.VectorSimilarity.DOT_PRODUCT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just silently change to cosine
when using the GPU integration to build the index, then using DOT_PRODUCT
on search would work just fine.
this way we only disallow max-inner-product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, I will follow up on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 60e1dc9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. I have one concern about element_type: byte
vs. quantized element_type: float
which both would use CuVSMatrix.DataType.BYTE
right?
x-pack/plugin/gpu/src/main/java/org/elasticsearch/xpack/gpu/codec/ES92GpuHnswVectorsWriter.java
Outdated
Show resolved
Hide resolved
…dec/ES92GpuHnswVectorsWriter.java Co-authored-by: Benjamin Trent <[email protected]>
💔 Backport failed
You can use sqren/backport to manually backport by running |
For int8_hnsw, during merges we get quantized vectors from Lucene files but dropping for each quantized vector its correction factor. For cosine and euclidean metrics this correction factor is not important, but for dot_product and max_inner_product metrics, they are important. It means that that currently for dot_product and max_inner_product metrics, GPU graph building doesn't work well, and may produce bad recall. This PR does the following:
Alternatives: for most datasets (really majority), we can substitute dot_product with cosine. But there are some datasets that require max_inner_product, and for this our "int8_hsnw" will not work, and "hnsw" should be used instead