Skip to content

Add support for VAMANA index algorithm at index creation #430

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mgravell
Copy link

@mgravell mgravell commented Jul 28, 2025

Note: the existing API only used Dictionary<string, object> attributes, which means that we could just add the enum addition and call it a day, and let the user worry about making everything work. However, it seems sensible to help the user out by providing an extended API with proper support for the options specific to each type, so:

  • add structured object-model for complex vector index creation, with specialized types for the 3 vector types
    • add VectorField.Dimensions, .Type and .DistanceMetric
    • add mechanism to bake those into command without breaking existing usage
      • AddDirectAttributes, DirectAttributeCount
    • new type FlatVectorField
    • new type HnswVectorField
    • new type SvsVanamaVectorField
  • make it possible to extract approx command from SerializedCommand.ToString()
  • add implicit operator from string to FieldName to reduce new overloads needed
  • add new Schema.Add*VectorField APIs
  • add tests to demonstrate correct index construction

Open design question: I've used "human" names in place of M, EF_CONSTRUCTION, etc; - and EuclideanDistance instead of L2 - is this agreeable? or should it be left "raw"?

Side: this fixes a broken test; in the existing logic, if attributes is not specified, then <index_attribute_count> (i.e. 0) is never added. In reality this won't break any working code because attributes is pretty-much needed (because that's the only way to specify DIM etc in the existing API)

mgravell added 3 commits July 28, 2025 16:16
… specialized types for the 3 vector types

  - add VectorField.Dimensions, .Type and .DistanceMetric
  - add mechanism to bake those into command without breaking existing usage
    - AddDirectAttributes, DirectAttributeCount
  - new type FlatVectorField
  - new type HnswVectorField
  - new type SvsVanamaVectorField
- make it possible to extract approx command from SerializedCommand.ToString()
- add implicit operator from string to FieldName to reduce new overloads needed
- add new Schema.Add*VectorField APIs
- add tests to demonstrate correct index construction
@mgravell mgravell changed the title Add support for VANAMA index algorithm at index creation Add support for VAMANA index algorithm at index creation Jul 31, 2025
Copy link
Collaborator

@atakavci atakavci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one nit.
and for your question around naming,, i believe there's significant value in keeping naming similar/closer -if not exact- across client and server boundaries.
That allows developers to quickly and instinctively map/correlate functionality when working with both sides of the application, especially during root causing issues or feature development. But this is rather a strategical desicion rather than my personal opinion on naming,, also aligning across libraries is a factor too.
So lets get more people involved into a conversation around this.

/// </summary>
public int TrainingThreshold { get; set; }
/// <summary>
/// The dimension used when using LeanVec compression for dimensionality reduction; defaults to dim/2 (applicable only with compression of type LeanVec, should always be < dim)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use of special chars break the IDE shows docs and hints. This also shows up as tons of warnings on github PR review page.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@uglide
Copy link
Contributor

uglide commented Aug 1, 2025

While I agree with Ali, for the builders, we can use more human-friendly names to improve DevEx. One note though, we should reference "raw names" in docstrings to help humans and LLMS to pick up appropriate attributes, enum values, etc

/// <summary>
/// Euclidean distance between two vectors.
/// </summary>
EuclideanDistance = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the doc strings to refer to Redis "raw" names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@mgravell
Copy link
Author

mgravell commented Aug 1, 2025

LGTM, just one nit. and for your question around naming,, i believe there's significant value in keeping naming similar/closer -if not exact- across client and server boundaries. That allows developers to quickly and instinctively map/correlate functionality when working with both sides of the application, especially during root causing issues or feature development. But this is rather a strategical desicion rather than my personal opinion on naming,, also aligning across libraries is a factor too. So lets get more people involved into a conversation around this.

it is also possible to duplicate enum values in C#, i.e.

public enum VectorDistanceMetric {
   ...
   EuclideanDistance = 1,
   L2 = EuclideanDistance,
   // ...
}

with the result that VectorDistanceMetric.L2 and VectorDistanceMetric.EuclideanDistance both work, and are identical. The fun question is "what is someEnum.ToString() ? " - the simple answer is that it is undefined, and changes in different runtimes and platforms - but that doesn't impact our use, because the code does a switch itself (for other reasons; stability here is just a nice side-effect)

Not saying we should / shouldn't do that - just one more thing to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants