Skip to content

Ship botlog URI with query params intact#9

Merged
sergeyfast merged 4 commits into
masterfrom
fix/botlog-query-params
May 12, 2026
Merged

Ship botlog URI with query params intact#9
sergeyfast merged 4 commits into
masterfrom
fix/botlog-query-params

Conversation

@sergeyfast
Copy link
Copy Markdown
Contributor

@sergeyfast sergeyfast commented May 12, 2026

Summary

Fixes a bot-logs UI bug: Event.URI shipped to gatesrv had its query string stripped, so HasQueryParams on the gatesrv side always returned false.

Three commits, each independent:

1. 4f0704e Ship botlog URI with query params intact

  • Add ParsedLine.RawURI (path + query string) — populated in all three parse paths: text/$request, text/$uri, JSON request_uri
  • botlog Observer now ships the full URI instead of the path-only form
  • Bump DefaultBatchInterval 5s → 30s — quiet hosts were flushing ~50-event payloads every tick; larger batches compress better

2. 50cb7c7 Drop redundant ParsedLine.RawPath

Cleanup found by /simplify review:

  • RawPath and RawURI were always populated together — the middle of the RawURI → RawPath → URI fallback was never reachable
  • Remove RawPath field plus the helpers rawPathFromRequest / rawPathFromJSON, eliminating a duplicate strings.SplitN per parsed line on the hot path
  • Observer fallback collapses to RawURI → URI

3. 8ca89d7 Cap botlog Event.URI at URITruncate

Defense-in-depth for the now-full-query URIs:

  • Add Config.URITruncate (default 2048) symmetrical to UATruncate
  • Pathological bot probes routinely send 4-8KB URIs with base64/SQLi payloads — capping bounds WAL writes and gatesrv POST body size
  • Pre-fix RawPath stripped queries kept URIs naturally short; with RawURI we need an explicit cap

Test plan

  • make fmt lint test — 0 issues, all green
  • New tests:
    • TestObserver_UsesRawURIOverURI — Event.URI carries query string
    • TestObserver_FallsBackToURIWhenRawURIEmpty — legacy $uri fallback
    • TestObserver_TruncatesURI — 5KB URI capped at 2048
    • Five table-driven cases in TestParsedLine_RawURIUnnormalized covering both parsers and percent-encoded chars (%20, %2F, %3D, %26)
  • Smoke-check on staging: verify bot-log events now carry ?... in URI and gatesrv UI groups with/without query params correctly

Out of scope (deliberate)

  • Privacy / token redaction in query strings. Query strings can contain session tokens, OAuth codes, PII. Pre-fix stripQuery was an implicit privacy guard; now they ship to gatesrv. Decision is to handle redaction on the gatesrv side or accept the risk (trusted infra). Worth a follow-up conversation.

- Add ParsedLine.RawURI (path + query string) alongside RawPath
  so observers see the full hit URL; populated in all three parse
  paths (text/$request, text/$uri, JSON request_uri)
- Switch botlog Observer to prefer RawURI with fallback chain
  RawURI -> RawPath -> URI; gatesrv-side HasQueryParams now works
  on bot-log events instead of always returning false
- Bump DefaultBatchInterval 5s -> 30s; quiet hosts were flushing
  ~50-event payloads every tick, larger batches compress better
- Cover the new field across both parsers including
  percent-encoded query strings, plus the observer fallback chain
- RawPath was only read by botlog Observer as the middle of a
  three-level fallback (RawURI -> RawPath -> URI). All three parser
  paths populate RawPath and RawURI together, so RawPath was never
  the only field set — middle fallback was unreachable
- Remove rawPathFromRequest and rawPathFromJSON; this eliminates a
  duplicate strings.SplitN per parsed line on the hot path
- Collapse observer fallback to RawURI -> URI; drop the synthetic
  TestObserver_FallsBackToRawPathWhenRawURIEmpty that exercised
  the unreachable state
- Rename TestParsedLine_RawPathUnnormalized to *_RawURIUnnormalized
  and drop wantRawPath assertions; recordedLine.rawPath gone too
- Add Config.URITruncate (default 2048) symmetrical to UATruncate;
  IE's legacy 2083-char URL limit covers >99% of legitimate traffic
- Observer truncates Event.URI before BuildEvent so pathological
  bot probes (4-8KB base64/SQLi payloads in query strings) cannot
  bloat WAL writes or the gatesrv POST body
- Pre-fix RawPath stripped queries kept URIs naturally short; with
  RawURI carrying the query verbatim we need an explicit cap
- BatchInterval 5s -> 30s
- Add URITruncate row (default 2048)
@sergeyfast sergeyfast merged commit db79202 into master May 12, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant