-
Notifications
You must be signed in to change notification settings - Fork 47
Add support for VAMANA index algorithm at index creation #430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… specialized types for the 3 vector types - add VectorField.Dimensions, .Type and .DistanceMetric - add mechanism to bake those into command without breaking existing usage - AddDirectAttributes, DirectAttributeCount - new type FlatVectorField - new type HnswVectorField - new type SvsVanamaVectorField - make it possible to extract approx command from SerializedCommand.ToString() - add implicit operator from string to FieldName to reduce new overloads needed - add new Schema.Add*VectorField APIs - add tests to demonstrate correct index construction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one nit.
and for your question around naming,, i believe there's significant value in keeping naming similar/closer -if not exact- across client and server boundaries.
That allows developers to quickly and instinctively map/correlate functionality when working with both sides of the application, especially during root causing issues or feature development. But this is rather a strategical desicion rather than my personal opinion on naming,, also aligning across libraries is a factor too.
So lets get more people involved into a conversation around this.
src/NRedisStack/Search/Schema.cs
Outdated
/// </summary> | ||
public int TrainingThreshold { get; set; } | ||
/// <summary> | ||
/// The dimension used when using LeanVec compression for dimensionality reduction; defaults to dim/2 (applicable only with compression of type LeanVec, should always be < dim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use of special chars break the IDE shows docs and hints. This also shows up as tons of warnings on github PR review page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
While I agree with Ali, for the builders, we can use more human-friendly names to improve DevEx. One note though, we should reference "raw names" in docstrings to help humans and LLMS to pick up appropriate attributes, enum values, etc |
/// <summary> | ||
/// Euclidean distance between two vectors. | ||
/// </summary> | ||
EuclideanDistance = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the doc strings to refer to Redis "raw" names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
it is also possible to duplicate enum values in C#, i.e. public enum VectorDistanceMetric {
...
EuclideanDistance = 1,
L2 = EuclideanDistance,
// ...
} with the result that Not saying we should / shouldn't do that - just one more thing to consider. |
- fix LeanVec command typo
Note: the existing API only used
Dictionary<string, object> attributes
, which means that we could just add the enum addition and call it a day, and let the user worry about making everything work. However, it seems sensible to help the user out by providing an extended API with proper support for the options specific to each type, so:Open design question: I've used "human" names in place of
M
,EF_CONSTRUCTION
, etc; - andEuclideanDistance
instead ofL2
- is this agreeable? or should it be left "raw"?Side: this fixes a broken test; in the existing logic, if
attributes
is not specified, then<index_attribute_count>
(i.e.0
) is never added. In reality this won't break any working code becauseattributes
is pretty-much needed (because that's the only way to specifyDIM
etc in the existing API)