Skip to content

Conversation

mayya-sharipova
Copy link
Contributor

@mayya-sharipova mayya-sharipova commented Oct 21, 2025

For int8_hnsw, during merges we get quantized vectors from Lucene files but dropping for each quantized vector its correction factor. For cosine and euclidean metrics this correction factor is not important, but for dot_product and max_inner_product metrics, they are important. It means that that currently for dot_product and max_inner_product metrics, GPU graph building doesn't work well, and may produce bad recall. This PR does the following:

  • disallows max_inner_product for int8
  • substitutes internally dot_product with cosine

Alternatives: for most datasets (really majority), we can substitute dot_product with cosine. But there are some datasets that require max_inner_product, and for this our "int8_hsnw" will not work, and "hnsw" should be used instead

For int8_hnsw, during merges we get quantized vectors from Lucene files but
dropping for each quantized vector its correction factor.  For cosine and
euclidean metrics this correction factor is not important, but for
dot_product and max_inner_product metrics, they are important. It means that
that currently for dot_product and max_inner_product metrics, GPU graph building
doesn't work well, and may product bad recall. Thus this PR diallows these metrids.

Alternatives: for most datasets (really majority), we can substitute dot_product with cosine.
But there are some datasets that require max_inner_product, and for this our "int8_hsnw"
will not work, and "hnsw" should be used instead
@mayya-sharipova mayya-sharipova added >bug :Search Relevance/Search Catch all for Search Relevance v9.2.1 v9.3.0 test-gpu Run tests using a GPU auto-backport Automatically create backport pull requests when merged labels Oct 21, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Oct 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @mayya-sharipova, I've created a changelog YAML for you.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Oct 21, 2025

KnnIndexTester
opeanai: 2.6M docs; 1536 dims; 8 indexing threads

Notice low recall for dot_product on gpu, while recall recovers for cosine.

int8 dot_product

index_type index_time( ms) force_merge_time (ms) QPS multiple segs recall multiple segs QPS force_merge 5 segs recall 5 segs
cpu 697877 152161 95 0.84 148 0.84
gpu 235097 29671 117 0.71 185 0.48

int8 cosine

index_type index_time( ms) force_merge_time (ms) QPS multiple segs recall multiple segs QPS force_merge 5 segs recall 5 segs
cpu 739290 151696 96 0.85 151 0.83
gpu 286267 45025 99 0.99 136 0.99

But for other datasets, there is no difference:

hotpotqa-arctic: 5.2M docs; 768 dims; 8 indexing threads
int8 dot_product

index_type index_time( ms) force_merge_time (ms) QPS multiple segs recall multiple segs QPS force_merge 5 segs recall 5 segs
cpu 565207 177059 147 0.67 224 0.69
gpu 357982 58152 124 0.88 167 0.89
index_type index_time( ms) force_merge_time (ms) QPS multiple segs recall multiple segs QPS force_merge 5 segs recall 5 segs
cpu 600691 175769 154 0.68 217 0.68
gpu 303617 54235 117 0.87 160 0.86

}
return new ES92GpuHnswVectorsFormat(hnswIndexOptions.m(), efConstruction);
} else if (indexOptions.getType() == DenseVectorFieldMapper.VectorIndexType.INT8_HNSW) {
if (similarity == DenseVectorFieldMapper.VectorSimilarity.DOT_PRODUCT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should just silently change to cosine when using the GPU integration to build the index, then using DOT_PRODUCT on search would work just fine.

this way we only disallow max-inner-product.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, I will follow up on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 60e1dc9

@mayya-sharipova mayya-sharipova removed the test-gpu Run tests using a GPU label Oct 22, 2025
Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. I have one concern about element_type: byte vs. quantized element_type: float which both would use CuVSMatrix.DataType.BYTE right?

@mayya-sharipova mayya-sharipova added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Oct 22, 2025
@elasticsearchmachine elasticsearchmachine merged commit 9634fd6 into elastic:main Oct 22, 2025
34 checks passed
@mayya-sharipova mayya-sharipova deleted the gpu-avoid-dot-product-int8 branch October 22, 2025 21:19
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 136881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending >bug :Search Relevance/Search Catch all for Search Relevance Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants