We need a realistic nested vector index and retrieval nightly track. This task is to build a "semantic text" (esque) benchmark that indexes passage vectors with varying passage counts (1 to 10s to 100s).
The queries should do:
- Semantic text (esque) searches (I think just kNN if we don't want inference in there)
- kNN with passage highlighting
- kNN with inner_hits scoring (I have just been seeing customers do this).
Future / out of scope