Skip to content

fix(authz): schedule periodic audit-log retention cleanup#13546

Open
erichare wants to merge 1 commit into
release-1.10.0from
fix/schedule-authz-audit-retention-cleanup
Open

fix(authz): schedule periodic audit-log retention cleanup#13546
erichare wants to merge 1 commit into
release-1.10.0from
fix/schedule-authz-audit-retention-cleanup

Conversation

@erichare

@erichare erichare commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Problem

clean_authz_audit_log() (services/utils.py) is implemented and unit-tested, and bulk-deletes authz_audit_log rows older than AUTHZ_AUDIT_RETENTION_DAYS (default 90; 0 disables). But it was invoked exactly once, at startup inside initialize_services(). A long-running instance never pruned again after boot, so the table grew unbounded between restarts — even though the retention field's own docstring promised rows are "deleted on startup and by the periodic cleanup task." The periodic task did not exist; this PR adds it. (Closes gap C of the OSS RBAC rollout on release-1.10.0.)

Change

Wire the existing helper to a recurring background worker, modelled on the sibling temp_flow_cleanup.CleanupWorker:

  • AuditLogCleanupWorker — new module services/task/audit_cleanup.py. A stop-event-driven asyncio task that prunes on a fixed interval, each sweep opening its own session_scope(). Best-effort: the helper already logs-and-swallows DB errors, and an outer guard keeps the loop alive across transient outages, so it never blocks the event loop or the request path.
  • Gating — the worker only schedules when AUTHZ_AUDIT_ENABLED is True and AUTHZ_AUDIT_RETENTION_DAYS > 0. Otherwise start() is a quiet no-op (no task created). retention=0 therefore stays a no-op end to end.
  • Sleep-first loop — the unconditional startup sweep still prunes at boot, so the first scheduled pass waits one interval to avoid a redundant immediate delete. The startup sweep is left unchanged.
  • New settingAUTHZ_AUDIT_CLEANUP_INTERVAL (AuthSettings), default 86400 (daily), floor 300s.
  • Lifespan wiring — started right after the telemetry writer and stopped in the shutdown "Cancelling Background Tasks" step, both wrapped best-effort so cleanup scheduling can never block startup/shutdown.

Why a worker (not a setting-only change)

The retention logic already existed and was correct — the only missing piece was recurrence. Reusing the helper verbatim (rather than reimplementing the delete) keeps the bounded-table guarantee in one place, and the CleanupWorker shape is already the established pattern in this package for periodic DB cleanup.

Tests

New test_audit_cleanup_worker.py (7 tests):

  • test_worker_runs_cleanup_repeatedly_on_schedule — the acceptance test: the helper is invoked ≥2× on a tiny interval, proving recurrence (not just a startup call).
  • test_worker_is_noop_when_retention_disabledAUTHZ_AUDIT_RETENTION_DAYS=0 ⇒ no task scheduled, helper never called.
  • test_worker_is_noop_when_audit_disabledAUTHZ_AUDIT_ENABLED=False ⇒ no-op.
  • test_worker_prunes_old_rows_on_schedule — end-to-end against a real SQLite engine: a row inserted after startup is pruned by the scheduled worker while an in-window row survives.
  • Plus start/stop idempotency, interval resolution (override vs setting vs default), and resilience to a failing sweep.
26 passed   # new file + existing audit/retention/authorization-service suites

Backend ruff check/ruff format clean; all pre-commit hooks (incl. detect-secrets) pass.

Notes / non-goals

  • Like the existing startup sweep and temp_flow_cleanup, each uvicorn worker runs its own sweep. The delete is a single idempotent WHERE timestamp < cutoff, so concurrent sweeps after the first are no-ops; a distributed lock is intentionally out of scope (EE territory).
  • Gates are read at start() (the env-configured common case). Runtime-toggling AUTHZ_AUDIT_ENABLED to False leaves a harmless pruner running until restart; the helper's internal retention<=0 check is the per-tick defense for retention.

Summary by CodeRabbit

  • New Features
    • Automatic periodic cleanup of authorization audit logs to prevent indefinite database accumulation.
    • Background worker runs daily by default to remove expired audit entries.
    • Cleanup respects audit logging configuration and only activates when auditing and retention are enabled.

clean_authz_audit_log() was only invoked once, at startup inside
initialize_services(). A long-running instance never pruned authz_audit_log
again after boot, so the table grew unbounded between restarts even though the
retention helper was already implemented and unit-tested.

Wire the same helper to a recurring background worker:

- New AuditLogCleanupWorker (services/task/audit_cleanup.py), modelled on the
  sibling temp_flow_cleanup.CleanupWorker: a stop-event-driven asyncio task that
  prunes on a fixed interval in its own session_scope(). Best-effort -- the
  helper logs-and-swallows DB errors and an outer guard keeps the loop alive, so
  it never blocks the event loop or request path.
- Gated: the worker only schedules when AUTHZ_AUDIT_ENABLED is True and
  AUTHZ_AUDIT_RETENTION_DAYS > 0; otherwise start() is a no-op. Retention=0
  stays a no-op end to end.
- Sleep-first loop: the unconditional startup sweep still prunes at boot, so the
  first scheduled pass waits one interval to avoid a redundant immediate delete.
  The startup sweep is left unchanged.
- New AUTHZ_AUDIT_CLEANUP_INTERVAL setting (default 86400 = daily, floor 300s).
- Started/stopped from the application lifespan, both best-effort.

Tests (test_audit_cleanup_worker.py) show cleanup runs repeatedly on the
recurring schedule (not just at startup), is a no-op when retention or auditing
is disabled, survives sweep failures, and -- end to end against a real SQLite
engine -- prunes a row inserted after startup while leaving in-window rows.

Closes the audit-retention-scheduling gap (C) for OSS RBAC on release-1.10.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8d04f2c6-74f2-4a2a-817e-76c4ae1ad1bd

📥 Commits

Reviewing files that changed from the base of the PR and between 6610091 and 95bc647.

📒 Files selected for processing (4)
  • src/backend/base/langflow/main.py
  • src/backend/base/langflow/services/task/audit_cleanup.py
  • src/backend/tests/unit/services/authorization/test_audit_cleanup_worker.py
  • src/lfx/src/lfx/services/settings/auth.py

Walkthrough

This PR adds a periodic background worker that automatically prunes retention-expired rows from the authz_audit_log table. The worker is configurable via settings, conditionally activated based on audit logging configuration, integrated into the FastAPI application lifespan, and covered by comprehensive unit and integration tests.

Changes

Audit Log Retention Cleanup Worker

Layer / File(s) Summary
Configuration Settings
src/lfx/src/lfx/services/settings/auth.py
Adds AUTHZ_AUDIT_CLEANUP_INTERVAL setting with a default of 86400 seconds and a minimum of 300 seconds, governing the frequency of retention cleanup sweeps.
Worker Implementation
src/backend/base/langflow/services/task/audit_cleanup.py
Implements AuditLogCleanupWorker class with async lifecycle (start/stop), periodic loop that sleeps between intervals or waits for stop signal, session-scoped sweep execution via clean_authz_audit_log, and exception tolerance to keep the worker alive across failures.
Application Lifespan Wiring
src/backend/base/langflow/main.py
Integrates the worker into FastAPI lifespan: starts the worker after telemetry initialization and stops it after MCP manager shutdown, both with best-effort error handling and appropriate logging.
Test Coverage
src/backend/tests/unit/services/authorization/test_audit_cleanup_worker.py
Comprehensive test suite covering recurring scheduling semantics, conditional activation based on audit settings, lifecycle idempotency, exception resilience, interval resolution precedence, and end-to-end integration test with in-memory SQLite database confirming pruning behavior.

Sequence Diagram

sequenceDiagram
  participant Startup
  participant AuditLogCleanupWorker
  participant Database
  participant clean_authz_audit_log as clean_authz_audit_log service
  Startup->>AuditLogCleanupWorker: start()
  activate AuditLogCleanupWorker
  Note over AuditLogCleanupWorker: Verify audit enabled & retention > 0
  AuditLogCleanupWorker->>AuditLogCleanupWorker: Create async task for _run()
  Note over AuditLogCleanupWorker: Loop: sleep_or_stop(interval)
  AuditLogCleanupWorker->>Database: session_scope()
  activate Database
  Database->>clean_authz_audit_log: clean_authz_audit_log()
  activate clean_authz_audit_log
  clean_authz_audit_log->>Database: DELETE stale rows
  clean_authz_audit_log-->>Database: return pruned count
  deactivate clean_authz_audit_log
  Database-->>AuditLogCleanupWorker: session committed/rolled back
  deactivate Database
  Note over AuditLogCleanupWorker: Continue loop or break on stop event
  Startup->>AuditLogCleanupWorker: stop()
  AuditLogCleanupWorker->>AuditLogCleanupWorker: Signal stop event, await task
  deactivate AuditLogCleanupWorker
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • langflow-ai/langflow#13126: Introduces telemetry writer background service integration pattern into FastAPI lifespan, establishing the precedent for best-effort worker startup/shutdown used by this audit cleanup worker PR.

Suggested labels

fix-index

Suggested reviewers

  • dkaushik94
  • jordanrfrazier
🚥 Pre-merge checks | ✅ 8 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: scheduling periodic audit-log retention cleanup via a background worker, which directly addresses the core problem solved by this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Test Coverage For New Implementations ✅ Passed Test file test_audit_cleanup_worker.py (241 lines, 7 tests) covers new AuditLogCleanupWorker with unit, integration, and edge cases following test_*.py convention with meaningful assertions.
Test Quality And Coverage ✅ Passed 7 comprehensive tests cover recurring cleanup execution, async patterns, gating conditions, lifecycle management, error resilience, and E2E database behavior following pytest backend conventions.
Test File Naming And Structure ✅ Passed Correct test_*.py naming, pytest async structure, 7 test functions with docstrings, fixtures, helpers, clear organization, positive/negative scenarios, edge cases, and integration test.
Excessive Mock Usage Warning ✅ Passed Mock usage is appropriate: minimal mocks (0.9 per test) target external dependencies only. Core logic tested unmocked with real asyncio objects. Includes integration test with real database.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/schedule-authz-audit-retention-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the bug Something isn't working label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✅ Test Coverage Advisor

No source changes detected without accompanying tests. Thanks for keeping coverage up! 🎉

Advisory check only — never blocks merge.

@github-actions github-actions Bot added bug Something isn't working and removed bug Something isn't working labels Jun 8, 2026
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.12346% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.49%. Comparing base (6610091) to head (95bc647).
⚠️ Report is 3 commits behind head on release-1.10.0.

Files with missing lines Patch % Lines
src/backend/base/langflow/main.py 60.00% 4 Missing ⚠️
...ckend/base/langflow/services/task/audit_cleanup.py 94.28% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@                Coverage Diff                 @@
##           release-1.10.0   #13546      +/-   ##
==================================================
+ Coverage           58.33%   58.49%   +0.16%     
==================================================
  Files                2290     2290              
  Lines              219855   219177     -678     
  Branches            32361    31136    -1225     
==================================================
- Hits               128245   128206      -39     
+ Misses              90151    89513     -638     
+ Partials             1459     1458       -1     
Flag Coverage Δ
backend 65.29% <90.00%> (+0.16%) ⬆️
frontend 57.81% <ø> (+0.19%) ⬆️
lfx 54.27% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/lfx/src/lfx/services/settings/auth.py 60.54% <100.00%> (+0.27%) ⬆️
src/backend/base/langflow/main.py 64.09% <60.00%> (+2.08%) ⬆️
...ckend/base/langflow/services/task/audit_cleanup.py 94.28% <94.28%> (ø)

... and 245 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Frontend Unit Test Coverage Report

Coverage Summary

Lines Statements Branches Functions
Coverage: 43%
43.28% (57621/133123) 69.22% (7829/11310) 41.49% (1291/3111)

Unit Test Results

Tests Skipped Failures Errors Time
4940 0 💤 0 ❌ 0 🔥 11m 52s ⏱️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant