-
Notifications
You must be signed in to change notification settings - Fork 525
fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Apply guardrails transformations to LLM inputs and bot outputs. #1297
Conversation
9e9d8f6
to
97871a5
Compare
In addition to the provided example and test, you can observe the unfixed behavior in the following branches (we’re planning to open PRs for these as well): |
Documentation preview |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1297 +/- ##
==========================================
Coverage ? 70.70%
==========================================
Files ? 161
Lines ? 16312
Branches ? 0
==========================================
Hits ? 11533
Misses ? 4779
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good to me!
Ideally, we would also have a test that involves the flows generating user intent for unhandled user utterance
and continuation on unhandled user utterance
that are affected by it.
@schuellc-nvidia, thank you for the review! Would it be OK if we add these tests in a follow-up? I'd like to take a closer look at the dialog flows first to ensure meaningful coverage. I believe the already included test verifies that the rails transformations are applied. We're relying on this behavior in upcoming PRs. Update: |
Yes, that's fine with me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @lapinek , this is a critical security fix with solid implementation 👍🏻 It is ready to merge 🚀
Would you please gpg sign your commits following contributing guidelines?
- Use processed $user_message instead of raw $event.final_transcript in LLM inputs. - Use processed $bot_message instead of raw $text in bot outputs. - Prevent sensitive or unfiltered data from reaching the LLM or users by correctly applying input/output rails transformations.
…cessed message handling in Colang 2
c1471c1
to
48db267
Compare
@Pouyanpi, thanks for your review! I've signed the commits. The reason I initially used "Signed-of-by..." was that it seemed like a valid alternative according to the contribution guidelines: https://github.com/NVIDIA/NeMo-Guardrails/blob/52ac7edc18e5b6fb1900b639b5c6734dca09b918/CONTRIBUTING.md#summary. Since GPG signature is required, maybe that wording could be more explicit. But I am happy to sign, no problem 🙂 |
Description
Currently, when input and output rails process (that is, transform or redact) user and bot messages, the processed versions are not used:
To reproduce, you can run this config against the current
develop
(requiresOPENAI_API_KEY
):poetry run nemoguardrails chat --config /path/to/config > Echo this: Tom
Expected output:
Actual output:
With patch provided in this PR, the output should be:
Related Issue(s)
Mentions
@schuellc-nvidia, @drazvan - since you’ve contributed most to the affected files, your review would be much appreciated!
Checklist