Skip to content

Randomise the query order in the MS Marco V2 param supplier#1025

Open
davidkyle wants to merge 1 commit intoelastic:masterfrom
davidkyle:shuffle
Open

Randomise the query order in the MS Marco V2 param supplier#1025
davidkyle wants to merge 1 commit intoelastic:masterfrom
davidkyle:shuffle

Conversation

@davidkyle
Copy link
Member

If a challenge contains many search operations using this supplier and the supplier is recreated each time then the same hot vectors at the beginning of the file will be used every time. This change shuffles the vectors read from the file.

for vector_query in queries_file:
self._queries.append(json.loads(vector_query))

self._queries = random.shuffle(self._queries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to allow a settable seed for the sake of reproducing the shuffle ordering and then printing the seed in the logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants