-
Notifications
You must be signed in to change notification settings - Fork 579
FEAT: Adding Harm Categories to Prompt Request Pieces #1116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Adding Harm Categories to Prompt Request Pieces #1116
Conversation
…quest entry and pieces
This is a good start! I think we should also have an example showing how to query by harm category within a specific op label, and the memory code needs a join between prompt memory entry and attack results to check for all results with a certain harm category in the pieces. |
tests/unit/memory/memory_interface/test_interface_attack_results.py
Outdated
Show resolved
Hide resolved
nice work! made a few small comments but overall looks good! |
tests/unit/memory/memory_interface/test_interface_attack_results.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two small comments on the comments :)
Description
Making it so we can query attack_results by harm_categories and memory labels. This value is currently present in seed prompts but was not queryable for attack results. To do this I made a few changes:
Tests and Documentation
Ran notebooks, added new unit tests