Skip to content

Severe memory leak/accumulation - Due to ZeroMQ subscription data build up #682

@dwsutherland

Description

@dwsutherland

Description

At NIWA we've noticed the consumption of roughly 5Gb a day of memory from long running UIS (via cylc gui):

Image

As a work around we've been restarting the UIS every 4-5 days...

We're not stopping and starting workflows under different runs all the time, we have a fairly static ~57 workflows in the operation, and a similar number in the test system..

Reproducible Example

Just running a UIS instance with one workflow running will, given time, show a reduction in free memory on the machine.

After analyzing objects in the UIS the problem didn't appear to be normal objects like the data store and other UIS attributes:

Image

Image

However, using this tool:
https://github.com/bloomberg/memray
on a UIS with 5 small workflows I was able to find the exact location of problem:

Image

Shows an increase over time:

Image

Image

And It appeared to be the executor threads of the data-store used for ZeroMQ subscriptions (getting data from the Scheduler(s)), threads which this tool can isolate:

Image

More specifically the delta processing after receipt is accumulating in size:

Image

Which I suspect is something to do with the data going from the subscription thread to the main thread..

Expected Behaviour

For a relatively static load of workflows, the UIS should consume a static amount of memory (not grow).

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions