@@ -76,24 +76,89 @@ When using EvalView, please follow these security best practices:
7676- ** Review dependencies** : Use tools like ` pip-audit ` to check for known vulnerabilities
7777- ** Lock versions** : Use ` requirements.txt ` or ` poetry.lock ` to pin dependency versions
7878
79+ ## Built-in Security Features
80+
81+ ### SSRF (Server-Side Request Forgery) Protection
82+
83+ EvalView includes built-in protection against SSRF attacks. By default in production mode, requests to the following destinations are blocked:
84+
85+ - ** Private IP ranges** : 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
86+ - ** Loopback addresses** : localhost, 127.0.0.0/8
87+ - ** Cloud metadata endpoints** : 169.254.169.254 (AWS, GCP, Azure)
88+ - ** Link-local addresses** : 169.254.0.0/16
89+ - ** Internal hostnames** : kubernetes.default, metadata.google.internal
90+
91+ #### Configuration
92+
93+ For local development, SSRF protection allows private URLs by default. To enable strict mode in production:
94+
95+ ``` yaml
96+ # .evalview/config.yaml
97+ allow_private_urls : false # Block private/internal networks (recommended for production)
98+ ` ` `
99+
100+ #### Security Considerations
101+
102+ - When running EvalView in production environments, set ` allow_private_urls: false`
103+ - Be cautious when loading test cases from untrusted sources - they can specify arbitrary endpoints
104+ - Review test case YAML files before running them in sensitive environments
105+
106+ # ## LLM Prompt Injection Mitigation
107+
108+ The LLM-as-judge feature includes protections against prompt injection attacks :
109+
110+ 1. **Output Sanitization** : Agent outputs are sanitized before being sent to the LLM judge
111+ - Long outputs are truncated (default : 10,000 chars) to prevent token exhaustion
112+ - Control characters are removed
113+ - Common prompt delimiters are escaped (```, # ##, ---, XML tags, etc.)
114+
115+ 2. **Boundary Markers** : Untrusted content is wrapped in unique cryptographic boundary markers
116+
117+ 3. **Security Instructions** : The judge prompt explicitly instructs the LLM to:
118+ - Ignore any instructions within the agent output
119+ - Only evaluate content quality, not meta-instructions
120+ - Not follow commands embedded in the evaluated content
121+
122+ # ### Limitations
123+
124+ While these mitigations reduce risk, they cannot completely prevent sophisticated prompt injection attacks. Consider :
125+
126+ - Agent outputs could still influence LLM evaluation through subtle manipulation
127+ - Very long outputs may be truncated, potentially hiding issues
128+ - New prompt injection techniques may bypass current protections
129+
130+ For high-stakes evaluations, consider :
131+ - Manual review of agent outputs
132+ - Multiple evaluation models
133+ - Structured evaluation criteria that are harder to manipulate
134+
79135# # Known Security Considerations
80136
81137# ## LLM-as-Judge Evaluation
82138
83139- EvalView uses OpenAI's API for output quality evaluation
84140- Test outputs and expected outputs are sent to OpenAI for comparison
141+ - Agent outputs are sanitized to mitigate prompt injection, but no protection is 100% effective
85142- **Recommendation**: Don't include sensitive/proprietary data in test cases if using LLM-as-judge
86143
87144# ## HTTP Adapters
88145
89146- Custom HTTP adapters may expose your agent endpoints
147+ - SSRF protection is enabled by default but can be bypassed with `allow_private_urls : true`
90148- **Recommendation**: Use authentication, HTTPS, and rate limiting on agent endpoints
91149
92150# ## Trace Data
93151
94152- Execution traces may contain sensitive information from agent responses
95153- **Recommendation**: Sanitize traces before sharing or storing long-term
96154
155+ # ## Verbose Mode
156+
157+ The `--verbose` flag may expose sensitive information in logs :
158+ - API request/response payloads
159+ - Query content and agent outputs
160+ - **Recommendation**: Avoid using verbose mode in production or when processing sensitive data
161+
97162# # Security Updates
98163
99164We will disclose security vulnerabilities through :
0 commit comments