Skip to content

No garbage collection for soft-deleted assets / orphaned S3 objects / expired share links #65

Description

@ravirajsinh45

Reported by @ducknoodledance in #63 (with a working community
workaround script).

Problem

Every entity uses soft-delete (deleted_at) and the codebase intentionally
"never hard-deletes in application code" (CLAUDE.md). But there is no
scheduled job that ever reclaims the storage, so:

  • Soft-deleted rows pile up in Postgres indefinitely.
  • The underlying S3 / MinIO objects (s3_key_raw, s3_key_processed,
    s3_key_thumbnail) are never removed, even after the parent asset,
    version, folder, or project is soft-deleted.
  • Orphans created by interrupted uploads / failed transcodes are not
    swept.
  • ShareLink.expires_at is defined but never enforced or cleaned.

For a long-lived self-hosted install this means the bucket and DB
grow unbounded.

Evidence

  • All delete endpoints only flip deleted_at — e.g. apps/api/routers/assets.py:197,
    apps/api/routers/projects.py, apps/api/routers/folders.py,
    apps/api/routers/share.py:440.
  • The only place that actually calls s3.delete_object is comment
    attachments in apps/api/routers/comments.py.
  • Celery beat schedule in apps/api/tasks/celery_app.py:59-64 only
    runs send_due_date_reminders — no cleanup task is registered.
  • MediaFile has no deleted_at; it's only reachable through the
    parent version, so once a version is soft-deleted those S3 keys
    become invisible to the app but stay in the bucket.

Proposed fix

  • Add a Celery beat task (cleanup_soft_deleted, e.g. daily) that:
    • hard-deletes rows soft-deleted more than N days ago (configurable
      retention window, default e.g. 30 days)
    • deletes their S3 objects (raw, processed/HLS prefix, thumbnail)
    • cascades through versions, media files, comments, annotations,
      approvals, share links
  • Add an orphan sweeper that lists S3 prefixes and removes keys with
    no matching DB row.
  • Honor ShareLink.expires_at (reject in the API and clean up
    expired rows).
  • Document the retention window + a manual "purge now" admin endpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions