-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Summary
Add a new configuration option include_response_body to HTTP probes that captures the response body and includes it in the /probe?debug=true output.
Use Case
When monitoring HTTP health endpoints, operators often need to see why a service is unhealthy, not just that it's unhealthy.
Example: Health Check Endpoint
Many services expose health endpoints that return detailed component status:
GET /health
HTTP/1.1 200 OK
Service Health Check
Database: ok
Cache: ok
Queue: CRITICAL - Connection refused
External API: ok
Or with degraded status (HTTP 418/503):
GET /health
HTTP/1.1 418 I'm a teapot
Service Health Check
Worker 1: ok
Worker 2: ok
Worker 3: DEGRADED - High latency
Worker 4: CRITICAL - Not responding
The Problem
Currently, blackbox_exporter only exports numeric metrics:
probe_success 1
probe_http_status_code 200
probe_duration_seconds 0.023
The response body containing diagnostic details is discarded at prober/http.go:
_, err = io.Copy(io.Discard, byteCounter)This means monitoring tools (Icinga, Nagios, custom dashboards) can only show:
- "Service is UP" or "Service is DOWN"
- HTTP status code
They cannot show:
- Which specific component failed
- Error messages from the health check
- Degradation details
Current Workarounds (All Suboptimal)
| Workaround | Problem |
|---|---|
| Make separate HTTP call from monitoring tool | Duplicate network traffic, timing inconsistency |
| Parse Prometheus metrics for details | Response body isn't a metric, can't be stored |
| Add custom exporter | Maintenance burden, reinventing blackbox |
Use fail_if_body_not_matches_regexp |
Only boolean match, no detail extraction |
Proposed Solution
Add include_response_body: bool option to HTTP probe configuration:
modules:
http_health_detailed:
prober: http
timeout: 5s
http:
method: GET
include_response_body: true # NEW OPTIONWhen enabled and debug=true is requested, include the response body in output:
GET /probe?target=http://service/health&module=http_health_detailed&debug=true
Logs for the probe:
ts=... level=info msg="Probe succeeded" ...
Response Body:
Service Health Check
Database: ok
Cache: ok
Queue: CRITICAL - Connection refused
External API: ok
Metrics that would have been returned:
probe_success 1
probe_http_status_code 200
...
Implementation Details
Scope of Changes
| File | Change |
|---|---|
config/config.go |
Add IncludeResponseBody bool to HTTPProbe struct |
prober/http.go |
Capture body when option enabled, return as additional value |
prober/handler.go |
Include body in DebugOutput() when available |
Memory Safety
Body capture should be limited to prevent memory exhaustion:
const maxBodySize = 65536 // 64KB limit
limitedReader := io.LimitReader(resp.Body, maxBodySize)
body, _ := io.ReadAll(limitedReader)
io.Copy(io.Discard, resp.Body) // Discard remainderBackward Compatibility
- Default:
false(no behavior change) - Opt-in: Only captures body when explicitly configured
- Debug-only: Body only appears in debug output, not in metrics
- No breaking changes: Existing configurations work unchanged
Why Debug Output?
The response body is diagnostic information, not a metric. It belongs in debug output because:
- Not numeric: Can't be a Prometheus metric
- Variable size: Could be large, shouldn't be in every scrape
- On-demand: Only needed when investigating issues
- Existing pattern: Debug output already contains logs and config
Example Integration
Icinga/Nagios
A monitoring plugin can fetch debug output when an alert fires:
# When probe_success == 0, fetch details
curl -s "http://blackbox:9115/probe?target=${TARGET}&module=${MODULE}&debug=true" | \
sed -n '/^Response Body:/,/^Metrics/p'Custom Dashboard
Show response body in alert details panel by querying debug endpoint.
Alternatives Considered
1. New /response Endpoint
GET /response?target=...&module=...
Rejected: Adds API surface, requires separate caching logic.
2. Store Body in Metric Label
probe_http_response_body{body="..."} 1
Rejected: Labels shouldn't contain arbitrary text, cardinality explosion.
3. Custom Info Metric
probe_http_response_info{component="database",status="ok"} 1
Rejected: Requires parsing logic, not generic.
Questions for Maintainers
- Is 64KB a reasonable default limit for body capture?
- Should there be a configurable
max_response_body_sizeoption? - Should body capture be a global option or per-module only?
- Any concerns about memory usage in high-frequency debug scenarios?
Willingness to Contribute
I'm willing to submit a PR implementing this feature if the maintainers are interested.