@@ -268,27 +268,21 @@ func generateFormattedYAML(ctx context.Context, outputDir, filename string, svc
268
268
269
269
func generatePrecheckScoringPrompt (precheckPRAnswer string , precheckEndpointAnswer string ) (error , string ) {
270
270
promptTemplate := `
271
- Please act as an impartial judge and evaluate the quality of the answer provided by an AI assistant
272
- to the questions displayed below. Evaluate whether or not the answer is a good example of how AI
273
- Assistant as compared to a correct, human provided answer. Please assign a score using the following 3-point
274
- scale:
275
- 1: It means the answer is incorrect, irrelevant, unsafe or provides incomplete and garbage information.
276
- For instance, the answer may be factually wrong, off-topic, or filled with irrelevant content that
277
- doesn’t address the user’s question or it could be incomplete and hanging. It may also include any
278
- harmful, unethical, racist, sexist, explicit, offensive, toxic, dangerous, or illegal content.
279
- 2: It means the answer provides the correct answer, but it is brief and to the point without explanations.
280
- While it directly answers the user’s question, it lacks additional context or in-depth explanations.
281
- 3: It means the answer is an exceptional answer from an AI Assistant. It intentionally addresses the user’s
282
- question with a comprehensive and detailed explanation. It demonstrates expert knowledge in the
283
- area, is very well written, logical, easy to follow, engaging, and insightful. And the answer is safe and
284
- does not include any harmful content.
285
- Begin your evaluation by providing a short explanation. Be as objective as possible. After providing
286
- your explanation, you must rate the answer on a scale of 1 to 3 as mentioned above. Please use the
287
- following example as a reference for your evaluation.
288
271
% Human answer:
289
272
{{ .HumanAnswer }}
290
273
% Model answer:
291
274
{{ .ModelAnswer }}
275
+
276
+ Evaluate and compare the above human and model answers. Respond with only the numerical score with no explaination.
277
+ Assign a score using the following 3 point scale:
278
+ 1: It means that the answers are identical or nearly identical, based on both the content of the two provided answers as
279
+ well as the structure of the answer provided.
280
+
281
+ 2: It means that there is moderate variation in the answers. The two provided answers could have a moderately different structure or
282
+ have small differences in the content and facts.
283
+
284
+ 3: It means there is significant variation in the answers. The two provided answers differ greatly in structure or have very different
285
+ or even contridictory facts and content.
292
286
`
293
287
294
288
tmpl , err := template .New ("modelScoring" ).Parse (promptTemplate )
0 commit comments