Skip to content

feat(convert): add CTE-based correlation pre-filtering and sliding window format#44

Merged
mostafa merged 4 commits intomainfrom
feat/athena-correlation-improvements
Apr 28, 2026
Merged

feat(convert): add CTE-based correlation pre-filtering and sliding window format#44
mostafa merged 4 commits intomainfrom
feat/athena-correlation-improvements

Conversation

@mostafa
Copy link
Copy Markdown
Member

@mostafa mostafa commented Apr 28, 2026

Summary

Inspired by the pySigma Athena backend, this PR adds two correlation improvements to the PostgreSQL backend:

  • CTE-based pre-filtering: non-temporal correlation types (event_count, value_count, value_sum, value_avg, etc.) now wrap referenced rules' queries in a WITH combined_events AS (q1 UNION ALL q2 ...) CTE. The aggregate reads from combined_events instead of scanning the entire table unfiltered. When no per-rule queries are available (standalone correlation rules), the backend falls back to the previous full-table scan with time filter.

  • sliding_window output format: a new output format for event_count correlations that uses SQL window functions instead of GROUP BY + HAVING. It produces a per-row sliding window that emits every event crossing the threshold within its trailing window:

    WITH combined_events AS (...),
    event_counts AS (
        SELECT *, COUNT(*) OVER (
            PARTITION BY group_by ORDER BY time
            RANGE BETWEEN INTERVAL 'N' SECOND PRECEDING AND CURRENT ROW
        ) AS correlation_event_count
        FROM combined_events
    )
    SELECT * FROM event_counts WHERE correlation_event_count >= threshold

    The existing GROUP BY + HAVING approach remains the default format.

Also includes:

  • README updates: both root and rsigma-convert READMEs updated with new output format, SELECT column selection, CLI backend options, CTE pre-filtering, and sliding window correlation documentation.
  • CI paths-ignore: ci.yml now skips runs for markdown, docs, assets, LICENSE, and .gitignore changes.

Test plan

  • 4 new unit tests for CTE pre-filtering (event_count with CTE, multi-rule UNION ALL, fallback without queries, value_count with CTE)
  • 4 new unit tests for sliding window (with CTE, without CTE, no group-by, default format unchanged)
  • All 72 unit tests pass
  • All 11 golden tests pass
  • cargo fmt --all -- --check passes
  • cargo clippy --workspace --all-targets --all-features -- -D warnings passes

mostafa added 4 commits April 28, 2026 13:27
When per-rule converted queries are available, non-temporal correlation
types (event_count, value_count, value_sum, value_avg, etc.) now wrap
the referenced rules' queries in a WITH combined_events AS (q1 UNION
ALL q2 ...) CTE. The aggregate then reads from combined_events instead
of scanning the entire table unfiltered.

This makes correlation queries self-contained: they only aggregate over
rows that match the referenced detection rules, rather than all rows in
the time window.

Inspired by the pySigma Athena backend's combined_events CTE pattern.
When no per-rule queries are available (standalone correlation rules),
the backend falls back to the previous full-table scan with time filter.
…lations

Add a new output format that uses SQL window functions for event_count
correlation queries, producing a per-row sliding window that emits
every event crossing the threshold within its trailing window:

  WITH combined_events AS (...),
  event_counts AS (
      SELECT *, COUNT(*) OVER (
          PARTITION BY group_by
          ORDER BY time
          RANGE BETWEEN INTERVAL 'N' SECOND PRECEDING AND CURRENT ROW
      ) AS correlation_event_count
      FROM combined_events
  )
  SELECT * FROM event_counts WHERE correlation_event_count >= threshold

The existing GROUP BY + HAVING approach remains the default, since both
have valid use cases (GROUP BY for periodic polling, window functions
for streaming/alerting).

Inspired by the pySigma Athena backend's window function pattern.
Add documentation for SELECT column selection, CLI backend options
(-O key=value), CTE-based correlation pre-filtering, and the
sliding_window output format. Update output format tables and
CLI example sections in both the root README and rsigma-convert README.
Add paths-ignore to ci.yml so that pushes and PRs that only touch
markdown files, docs, assets, LICENSE, or .gitignore do not trigger
the full CI suite unnecessarily.
@mostafa mostafa merged commit c527510 into main Apr 28, 2026
8 checks passed
@mostafa mostafa deleted the feat/athena-correlation-improvements branch April 28, 2026 11:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant