[wwb] Add text reranking pipeline #2786

sbalandi · 2025-10-02T22:45:31Z

Description

Added possibility to run text reranking pipeline in wwb. Also was added some logic for Qwen3 models.

results are saving to separate folder as file.npy per generation.

example to run for cross-encoder/ms-marco-MiniLM-L2-v2 / tomaarsen/Qwen3-Reranker-0.6B-seq-cls / Qwen/Qwen3-Reranker-0.6B:
wwb.py --base-model cross-encoder/ms-marco-MiniLM-L2-v2 --model-type text-reranking --gt-data gt_rerankings.csv

Ticket: CVS-172049

Checklist:

Tests have been updated or added to cover the new code
This patch fully addresses the ticket.
I have made corresponding changes to the documentation

as-suvorov · 2025-10-06T08:01:49Z

Were you able to run tomaarsen/Qwen3-Reranker-0.6B-seq-cls?

sbalandi · 2025-10-06T10:01:12Z

Were you able to run tomaarsen/Qwen3-Reranker-0.6B-seq-cls?

I didn't try with GenAI, with optimum-intel/hf yes

as-suvorov · 2025-10-06T10:03:45Z

Were you able to run tomaarsen/Qwen3-Reranker-0.6B-seq-cls?

I didn't try with GenAI, with optimum-intel/hf yes

Ok. just fyi it's expected to fail with genai

Copilot

Pull Request Overview

This PR adds text reranking pipeline support to the who_what_benchmark (wwb) tool, enabling evaluation of text reranking models including specialized handling for Qwen3 models. The implementation includes both OpenVINO GenAI and Optimum backends.

Key changes:

Added "text-reranking" model type support across the wwb pipeline
Implemented specialized Qwen3 model handling for CausalLM-based reranking architectures
Added reranking evaluation metrics and test coverage

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
wwb.py	Added text-reranking model type to CLI options and evaluation pipeline
whowhat_metrics.py	Implemented RerankingSimilarity metric for evaluating reranking performance
reranking_evaluator.py	New evaluator class for text reranking tasks with Qwen3 model support
model_loaders.py	Added reranking model loading functions for both GenAI and Optimum backends
init.py	Exported RerankingEvaluator class
test_cli_reranking.py	Added comprehensive test coverage for reranking functionality
requirements.txt	Added scipy dependency

Comments suppressed due to low confidence (1)

tools/who_what_benchmark/whowhatbench/reranking_evaluator.py:1

Corrected spelling of 'documets' to 'documents'.

from typing import Any, Union

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

tools/who_what_benchmark/whowhatbench/wwb.py

tools/who_what_benchmark/whowhatbench/reranking_evaluator.py

as-suvorov · 2025-10-07T09:43:09Z

tools/who_what_benchmark/tests/test_cli_reranking.py

+    GT_FILE = tmp_path / "gt.csv"
+    MODEL_PATH = tmp_path / model_id.replace("/", "--")
+
+    result = subprocess.run(["optimum-cli", "export",


Should be synced with OV_CACHE PR: #2781
cc: @akashchi

The repo has many versions of the original model, 3 in the root as well as different openvino and onnx:

@apaniukov please give us more details

apaniukov · 2025-10-07T15:26:19Z

tools/who_what_benchmark/tests/test_cli_reranking.py

+)
+def test_reranking_basic(model_id, model_type, tmp_path):
+    GT_FILE = tmp_path / "gt.csv"
+    MODEL_PATH = tmp_path / model_id.replace("/", "--")


It seems that WWB and GenAI tests use different replacement strategies. .replace("/", "_") in GenAI

@sbalandi let's align in a separate PR

## Description Added possibility to run text reranking pipeline in wwb. Also was added some logic for Qwen3 models. results are saving to separate folder as file.npy per generation. example to run for cross-encoder/ms-marco-MiniLM-L2-v2 / tomaarsen/Qwen3-Reranker-0.6B-seq-cls / Qwen/Qwen3-Reranker-0.6B: `wwb.py --base-model cross-encoder/ms-marco-MiniLM-L2-v2 --model-type text-reranking --gt-data gt_rerankings.csv ` Ticket: [CVS-172049](https://jira.devtools.intel.com/browse/CVS-172049) ## Checklist: - [x] Tests have been updated or added to cover the new code  - [ ] This patch fully addresses the ticket.  - [ ] I have made corresponding changes to the documentation

github-actions bot added the category: WWB PR changes WWB label Oct 2, 2025

sbalandi requested a review from apaniukov October 2, 2025 22:46

sbalandi force-pushed the qwen3_wwb_rerank branch 4 times, most recently from b74fac3 to 97fa04a Compare October 3, 2025 18:48

sbalandi requested a review from as-suvorov October 3, 2025 18:54

as-suvorov requested a review from Copilot October 6, 2025 11:16

Copilot AI reviewed Oct 6, 2025

View reviewed changes

sbalandi force-pushed the qwen3_wwb_rerank branch 2 times, most recently from 1f4d0d9 to 841a2ab Compare October 6, 2025 15:09

as-suvorov approved these changes Oct 7, 2025

View reviewed changes

apaniukov reviewed Oct 7, 2025

View reviewed changes

sbalandi force-pushed the qwen3_wwb_rerank branch from 841a2ab to 3fda7ea Compare October 7, 2025 15:40

as-suvorov enabled auto-merge October 7, 2025 15:48

as-suvorov added this pull request to the merge queue Oct 7, 2025

github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Oct 7, 2025

sbalandi added 2 commits October 7, 2025 20:15

[wwb] Add text reranking pipeline

c20a0ee

update

8896ffb

sbalandi force-pushed the qwen3_wwb_rerank branch 2 times, most recently from 6e68ae5 to cec3c42 Compare October 7, 2025 19:39

fix metric and Qwen3-Reranker accuracy

cf1457b

sbalandi force-pushed the qwen3_wwb_rerank branch from cec3c42 to cf1457b Compare October 7, 2025 19:58

sbalandi enabled auto-merge October 7, 2025 20:46

Merge branch 'master' into qwen3_wwb_rerank

5d845ec

sbalandi added this pull request to the merge queue Oct 8, 2025

auto-merge was automatically disabled October 8, 2025 14:33
Pull Request is not mergeable

Merged via the queue into openvinotoolkit:master with commit b924dd5 Oct 8, 2025
113 of 115 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wwb] Add text reranking pipeline #2786

[wwb] Add text reranking pipeline #2786

Uh oh!

sbalandi commented Oct 2, 2025 •

edited

Loading

Uh oh!

as-suvorov commented Oct 6, 2025

Uh oh!

sbalandi commented Oct 6, 2025

Uh oh!

as-suvorov commented Oct 6, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

as-suvorov Oct 7, 2025

Uh oh!

apaniukov Oct 7, 2025

Uh oh!

as-suvorov Oct 7, 2025

Uh oh!

apaniukov Oct 7, 2025

Uh oh!

as-suvorov Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[wwb] Add text reranking pipeline #2786

[wwb] Add text reranking pipeline #2786

Uh oh!

Conversation

sbalandi commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist:

Uh oh!

as-suvorov commented Oct 6, 2025

Uh oh!

sbalandi commented Oct 6, 2025

Uh oh!

as-suvorov commented Oct 6, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

as-suvorov Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apaniukov Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

as-suvorov Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

apaniukov Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

as-suvorov Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbalandi commented Oct 2, 2025 •

edited

Loading