Pensar - auto fix for Unvalidated LLM Prompt Injection in Code Generation Workflow #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch addresses the ML01 (Adversarial Input Manipulation) and ML09 (Model Output Manipulation) vulnerabilities by implementing validation for the prompt inputs that directly affect LLM behavior.
The key changes include:
Added a regex-based validation system to detect potentially harmful patterns in prompts that could be used for malicious code generation or validation bypassing.
Implemented real-time validation during input, providing immediate feedback to users when potentially harmful content is detected.
Enhanced form submission to prevent processing prompts with suspicious patterns, blocking potential attack vectors.
Added a visual warning about the risks of using advanced mode, educating users about the potential security implications.
Added UI feedback (red borders and error messages) to indicate validation issues.
The validation specifically looks for patterns like code execution functions, imports, environment variable access, and other potentially dangerous operations that could lead to harmful code generation or malicious actions. It also prevents excessively long prompts that might be used for prompt injection attacks.
This approach requires no new dependencies and maintains the existing workflow while adding critical security checks for user-provided prompts that influence LLM behavior.