Skip to content

Commit fce332b

Browse files
author
chenxwh user
committed
update prompt template
1 parent 4ea811d commit fce332b

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

src/fairseq2/recipes/lm/_online_finetune/_generative_judge.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,50 @@
195195
{reference_answer}
196196
"""
197197

198+
KWISE_WITH_SCORES_J1_PROMPT = """
199+
You are given a user question and {k} responses from {k} AI assistants. Your task is to act as an impartial judge and evaluate which response better follows the user's instructions and provides a higher-quality answer. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
200+
201+
Think carefully about how to assess the quality of the responses and finally, assign each response a score from 0 to 10, using either an integer or a decimal with up to 0.1 precision, with a higher score indicating a higher-quality response that better satisfies the criteria. Enclose the scores within the tags <score_assistant_1> </score_assistant_1>, <score_assistant_2> </score_assistant_2> and so on.
202+
203+
Format your output like this:
204+
<think> your_thinking_process </think>
205+
<score_assistant_1> your_score_1 </score_assistant_1>
206+
<score_assistant_2> your_score_2 </score_assistant_2>
207+
<score_assistant_3> your_score_3 </score_assistant_3>
208+
...
209+
210+
Below are the user's question and the responses:
211+
212+
[User Question]
213+
{instruction}
214+
215+
{responses}
216+
"""
217+
218+
KWISE_WITH_SCORES_J1_PROMPT_WITH_REF_ANSWER = """
219+
You are given a user question and {k} responses from {k} AI assistants. Your task is to act as an impartial judge and evaluate which response better follows the user's instructions and provides a higher-quality answer. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
220+
221+
Think carefully about how to assess the quality of the responses and finally, utilize the reference answer for your judgement.
222+
Finally, assign each response a score 1 if the response is correct, and 0 if not. Enclose the scores within the tags <score_assistant_1> </score_assistant_1>, <score_assistant_2> </score_assistant_2> and so on.
223+
224+
Format your output like this:
225+
<think> your_thinking_process </think>
226+
<score_assistant_1> 0 or 1 </score_assistant_1>
227+
<score_assistant_2> 0 or 1 </score_assistant_2>
228+
<score_assistant_3> 0 or 1 </score_assistant_3>
229+
...
230+
231+
Below are the user's question and the responses:
232+
233+
[User Question]
234+
{instruction}
235+
236+
[Reference Answer]
237+
{reference_answer}
238+
239+
{responses}
240+
"""
241+
198242
import re
199243
from abc import ABC, abstractmethod
200244
from typing import Any

0 commit comments

Comments
 (0)