Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions pyrit/executor/attack/printer/console_printer.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,18 @@ async def print_conversation_async(
# Display images if present
await display_image_response(piece)

# Print scores with better formatting (only if auxiliary scores are requested)
# Always print objective scores
scores = self._memory.get_prompt_scores(prompt_ids=[str(piece.id)])
if scores:
print()
self._print_colored(f"{self._indent}📊 Scores:", Style.DIM, Fore.MAGENTA)
objective_score = [score for score in scores if score.score_category == "objective"][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not actually what the category is for/does.

After reviewing the code, I don't believe there is a way to do this right now unless we add some kind of metadata for the objective scorer to identify the scoring records and set them apart from other scoring records.

@rlundeen2 @bashirpartovi thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, @romanlutz that makes sense. I used score_category == "objective" here since it was the only discriminator readily available in the current code, but I understand that’s not really what the field is intended for.

I’m happy to adjust this if the team prefers adding explicit metadata to distinguish the objective scorer from the other scoring records. Let me know if you’d like me to make that change in this PR, or if it would be better to wait until we decide on the right long-term design for identifying objective scores.

@rlundeen2 @bashirpartovi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked with @rlundeen2 who had the correct observation that AttackResult has the last_score. That is only the objective score of the last iteration. We would still miss objective scores from earlier iterations unless we compare that scorer_class_identifier with the ones from previous iterations and if it's a match we'll present it as objective score. It's a bit tedious but definitely the easiest way to get it done without changes to other parts of the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually nevermind, just display the last_score from AttackResult when it shows up for one of the responses. We don't need the objective scores for every iteration. If someone wants all scores they can set the flag to print all aux scores.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying, @romanlutz. That makes sense. I’ll adjust the implementation to display only the last_score from AttackResult as suggested. This keeps the change straightforward while still ensuring the objective score is shown. If users need full iteration details, they can indeed enable the auxiliary scores flag.

self.print_score(objective_score)

# Print auxiliary scores only if requested
if include_auxiliary_scores:
scores = self._memory.get_prompt_scores(prompt_ids=[str(piece.id)])
if scores:
print()
self._print_colored(f"{self._indent}📊 Scores:", Style.DIM, Fore.MAGENTA)
for score in scores:
for score in scores:
if score.score_category == "auxiliary":
self._print_score(score)

print()
Expand Down