[WIP] Issue 4447: batched cosine / euclidean procedure for more efficient computation of vector similarities #4458
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #4447
[WIP]
Note:
To use vector at the moment we have to test it with an enterprise and this config:
The
genai.vector.cosine
function doesn't exist,i think is intended the genai.vector.encode together with the vector.similarity.cosine .
I don't think there is a pro to create a new apoc procedure, since using this cypher query:
is faster then the:
While using the encode we have to execute e.g:
Tried to replicate the implementation of the one present in
vector.similarity.cosine
from here but it quite hard to understand it and maybe is part of a non-public source code.At this time the Java Vector API / SIMD is not feasible, or better is not useful since it produce a bottleneck, since it leverage arrays like float[], double[] ... (like documented here and here).
For now we can access only a specific index and retrieve float/double of the vector (see here, we can't access to the entire
coordinates
)But in any case maybe the pure Cypher method it would still be better since it leverage the Java Vector API as well (in fact if we execute the vector.similarity.cosine function the following message is printed):
With the
vector.similarity.cosine
function using vector data type the multiple conversion before similarity doesn't seem to be present anymore as it compare directly the vector data types, of converting e.g. List coming from thegenai.vector.encode
resultsTimes: