-
Notifications
You must be signed in to change notification settings - Fork 291
[wwb] Add text reranking pipeline #2786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b74fac3 to
97fa04a
Compare
|
Were you able to run |
I didn't try with GenAI, with optimum-intel/hf yes |
Ok. just fyi it's expected to fail with genai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds text reranking pipeline support to the who_what_benchmark (wwb) tool, enabling evaluation of text reranking models including specialized handling for Qwen3 models. The implementation includes both OpenVINO GenAI and Optimum backends.
Key changes:
- Added "text-reranking" model type support across the wwb pipeline
- Implemented specialized Qwen3 model handling for CausalLM-based reranking architectures
- Added reranking evaluation metrics and test coverage
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| wwb.py | Added text-reranking model type to CLI options and evaluation pipeline |
| whowhat_metrics.py | Implemented RerankingSimilarity metric for evaluating reranking performance |
| reranking_evaluator.py | New evaluator class for text reranking tasks with Qwen3 model support |
| model_loaders.py | Added reranking model loading functions for both GenAI and Optimum backends |
| init.py | Exported RerankingEvaluator class |
| test_cli_reranking.py | Added comprehensive test coverage for reranking functionality |
| requirements.txt | Added scipy dependency |
Comments suppressed due to low confidence (1)
tools/who_what_benchmark/whowhatbench/reranking_evaluator.py:1
- Corrected spelling of 'documets' to 'documents'.
from typing import Any, Union
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
1f4d0d9 to
841a2ab
Compare
| GT_FILE = tmp_path / "gt.csv" | ||
| MODEL_PATH = tmp_path / model_id.replace("/", "--") | ||
|
|
||
| result = subprocess.run(["optimum-cli", "export", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@apaniukov please give us more details
| ) | ||
| def test_reranking_basic(model_id, model_type, tmp_path): | ||
| GT_FILE = tmp_path / "gt.csv" | ||
| MODEL_PATH = tmp_path / model_id.replace("/", "--") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that WWB and GenAI tests use different replacement strategies. .replace("/", "_") in GenAI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbalandi let's align in a separate PR
841a2ab to
3fda7ea
Compare
6e68ae5 to
cec3c42
Compare
cec3c42 to
cf1457b
Compare
Pull Request is not mergeable
b924dd5
## Description Added possibility to run text reranking pipeline in wwb. Also was added some logic for Qwen3 models. results are saving to separate folder as file.npy per generation. example to run for cross-encoder/ms-marco-MiniLM-L2-v2 / tomaarsen/Qwen3-Reranker-0.6B-seq-cls / Qwen/Qwen3-Reranker-0.6B: `wwb.py --base-model cross-encoder/ms-marco-MiniLM-L2-v2 --model-type text-reranking --gt-data gt_rerankings.csv ` Ticket: [CVS-172049](https://jira.devtools.intel.com/browse/CVS-172049) ## Checklist: - [x] Tests have been updated or added to cover the new code <!--- If the change isn't maintenance related, update the tests at https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or explain in the description why the tests don't need an update. --> - [ ] This patch fully addresses the ticket. <!--- If follow-up pull requests are needed, specify in description. --> - [ ] I have made corresponding changes to the documentation

Description
Added possibility to run text reranking pipeline in wwb. Also was added some logic for Qwen3 models.
results are saving to separate folder as file.npy per generation.
example to run for cross-encoder/ms-marco-MiniLM-L2-v2 / tomaarsen/Qwen3-Reranker-0.6B-seq-cls / Qwen/Qwen3-Reranker-0.6B:
wwb.py --base-model cross-encoder/ms-marco-MiniLM-L2-v2 --model-type text-reranking --gt-data gt_rerankings.csvTicket: CVS-172049
Checklist: