[Text]Add accuracy test for model Mistral-7B-Instruct-v0.1 #3742

MrZ20 · 2025-10-25T03:22:39Z

What this PR does / why we need it?

Add accuracy test for model Mistral-7B-Instruct-v0.1

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@c9461e0

gemini-code-assist

Code Review

This pull request adds a new accuracy test for the Mistral-7B-Instruct-v0.1 model. My review focuses on the correctness and security of the new test configuration. I've identified a critical security issue with the use of trust_remote_code: True and the use of a mirrored model from a non-official source. I recommend switching to the official mistralai model, which also resolves the security concern. Additionally, I've pointed out that the expected accuracy metrics in the test are significantly lower than published benchmarks, which could weaken the test's ability to catch regressions. I've provided suggestions to address these points.

gemini-code-assist · 2025-10-25T03:24:15Z

tests/e2e/models/configs/Mistral-7B-Instruct-v0.1.yaml

+model_name: "AI-ModelScope/Mistral-7B-Instruct-v0.1"
+runner: "linux-aarch64-a2-1"
+hardware: "Atlas A2 Series"
+tasks:
+- name: "gsm8k"
+  metrics:
+  - name: "exact_match,strict-match"
+    value: 0.35
+  - name: "exact_match,flexible-extract"
+    value: 0.38
+trust_remote_code: True


Using a mirrored model from AI-ModelScope and setting trust_remote_code: True introduces security and correctness risks. trust_remote_code: True allows arbitrary code execution and should be avoided. It's highly recommended to use the official mistralai/Mistral-7B-Instruct-v0.1 model, which is more secure as it does not require this flag, and ensures you are testing against the canonical model version. Please update the model name and remove the trust_remote_code setting.

model_name: "mistralai/Mistral-7B-Instruct-v0.1" runner: "linux-aarch64-a2-1" hardware: "Atlas A2 Series" tasks: - name: "gsm8k" metrics: - name: "exact_match,strict-match" value: 0.35 - name: "exact_match,flexible-extract" value: 0.38

gemini-code-assist · 2025-10-25T03:24:15Z

tests/e2e/models/configs/Mistral-7B-Instruct-v0.1.yaml

+  - name: "exact_match,strict-match"
+    value: 0.35
+  - name: "exact_match,flexible-extract"
+    value: 0.38


The expected accuracy values for gsm8k (35% and 38%) are significantly lower than the published score of 42.7% for Mistral-7B-Instruct-v0.1 on this benchmark (with 8-shot, which appears to be the setting used here). Using such a low expectation for accuracy can mask future performance regressions. For example, if the model performance degrades but remains above this low threshold, the test would still pass. It is recommended to investigate the cause of this discrepancy, which might be related to using a mirrored model. The expected values should be as close as possible to the actual measured performance to make the test meaningful for detecting regressions.

github-actions · 2025-10-25T03:24:28Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

github-actions · 2025-10-25T03:25:06Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MrZ20 <[email protected]>

gemini-code-assist bot reviewed Oct 25, 2025

View reviewed changes

github-actions bot added merge-conflicts module:tests labels Oct 25, 2025

add acc test

87851ce

Signed-off-by: MrZ20 <[email protected]>

MrZ20 force-pushed the Mistral_7B_Instruct_v0.1 branch from c2ff1d2 to 87851ce Compare October 27, 2025 11:50

github-actions bot removed the merge-conflicts label Oct 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Text]Add accuracy test for model Mistral-7B-Instruct-v0.1 #3742

[Text]Add accuracy test for model Mistral-7B-Instruct-v0.1 #3742

MrZ20 commented Oct 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

gemini-code-assist bot Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Text]Add accuracy test for model Mistral-7B-Instruct-v0.1 #3742

Are you sure you want to change the base?

[Text]Add accuracy test for model Mistral-7B-Instruct-v0.1 #3742

Conversation

MrZ20 commented Oct 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

github-actions bot commented Oct 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MrZ20 commented Oct 25, 2025 •

edited by github-actions bot

Loading