You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are given a user question and a response from an AI assistant. Your task is to act as an impartial judge and evaluate how well the response fulfills the user's instructions. You will be shown multiple responses to the same prompt, but only one at a time. Evaluate each response independently.
23
+
You are given a user question and a response from an AI assistant. Your task is to act as an impartial judge and evaluate how well the response fulfills the user's instructions. Do not allow the length of the response to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
24
24
25
25
Think carefully about how to assess the quality of the response and finally assign the assistant's response a score from 0 to 10, using either an integer or a decimal with up to 0.1 precision. A higher score should indicate a higher-quality response. Enclose the score within <score> and </score> tags.
26
26
@@ -110,6 +110,72 @@
110
110
[The End of Assistant B's Answer]
111
111
"""
112
112
113
+
KWISE_WITH_SCORES_J1_PROMPT="""
114
+
You are given a user question and {k} responses from {k} AI assistants. Your task is to act as an impartial judge and evaluate which response better follows the user's instructions and provides a higher-quality answer. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
115
+
116
+
Think carefully about how to assess the quality of the responses and finally, assign each response a score from 0 to 10, using either an integer or a decimal with up to 0.1 precision, with a higher score indicating a higher-quality response that better satisfies the criteria. Enclose the scores within the tags <score_assistant_1> </score_assistant_1>, <score_assistant_2> </score_assistant_2> and so on.
Below are the user's question and the two responses:
126
+
127
+
[User Question]
128
+
{instruction}
129
+
130
+
{responses}
131
+
"""
132
+
133
+
KWISE_WITH_SCORES_J1_PROMPT_WITH_REF_ANSWER="""
134
+
You are given a user question and {k} responses from {k} AI assistants. Your task is to act as an impartial judge and evaluate which response better follows the user's instructions and provides a higher-quality answer. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
135
+
136
+
Think carefully about how to assess the quality of the responses and finally, assign each response a score from 0 to 10, using either an integer or a decimal with up to 0.1 precision, with a higher score indicating a higher-quality response that better satisfies the criteria. Enclose the scores within the tags <score_assistant_1> </score_assistant_1>, <score_assistant_2> </score_assistant_2> and so on.
Below are the user's question and the two responses:
146
+
147
+
[User Question]
148
+
{instruction}
149
+
150
+
[Reference Answer]
151
+
{reference_answer}
152
+
153
+
{responses}
154
+
"""
155
+
156
+
# PAIRWISE_WITH_SCORES_J1_PROMPT = """
157
+
# You are given a user question and two responses from two AI assistants. You are also given their thinking process. Your task is to act as an impartial judge and evaluate which response better follows the user's instructions and provides a higher-quality answer. Care any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
158
+
159
+
# Carefully analyze the assistants' thought process, assess the quality of the responses and finally, assign each response a score from 0 to 10, using either an integer or a decimal with up to 0.1 precision, with a higher score indicating a higher-quality response that better satisfies the criteria. Enclose the scores within the tags <score_A> </score_A>, and <score_B> </score_B>.
You are given a user question and two responses from two AI assistants. Your task is to act as an impartial judge and evaluate which response better follows the user's instructions and provides a higher-quality answer. Avoid any position biases and ensure that the order in which the responses were presented does not influence your decision. Do not allow the length of the responses to influence your evaluation. Do not favor certain names of the assistants. Be as objective as possible.
115
181
@@ -147,18 +213,15 @@
147
213
148
214
classJudgmentExtractorHandler(ABC):
149
215
@abstractmethod
150
-
defcreate(self, tokenizer):
151
-
...
216
+
defcreate(self, tokenizer): ...
152
217
153
218
@property
154
219
@abstractmethod
155
-
defname(self) ->str:
156
-
...
220
+
defname(self) ->str: ...
157
221
158
222
@property
159
223
@abstractmethod
160
-
defconfig_kls(self) ->type[object]:
161
-
...
224
+
defconfig_kls(self) ->type[object]: ...
162
225
163
226
164
227
"""
@@ -177,12 +240,10 @@ class JudgmentExtractor(ABC):
0 commit comments