Skip to content

Delegate deletion of Finished tasks to workers #291

@rakovskij-stanislav

Description

@rakovskij-stanislav

Hello!

I noticed that karton-system became irresponsible on 500k+ keys. I know, we should avoid such situations, but I was careless to send several millions tasks to Low priority for retro-scan :)

While troubleshooting how to deal with a situation of 27M tasks without deleting them from a queue, I noticed that 67% of tasks are in Finished state but karton-system's GC is too slow to both process new tasks and delete old ones.

I see that previously services asked karton-system to change status (https://github.com/CERT-Polska/karton/blob/master/karton/system/system.py#L284). By moving task state change to a task worker, you lifted extra signals to karton-system, and that's good.

I suggest to delete Finished task immediately as a part of worker routine.

Advantages:

  • Speed up karton-system's GC stage -- both with deleting Finished tasks and searching for deletion candidates of S3 resources.

Disadvantages:

  • We are not deleting tasks in bulk. BUT otherwise we need to change task status to Finished in a worker anyway and it's also not a bult operation.
  • We lose a possibility to do something with Finished tasks in karton-system before gc-ing them.

Share with me your thoughts -- maybe you know severe issues with such approach?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions