-
Notifications
You must be signed in to change notification settings - Fork 21
Description
A small amount of memory (~1kb) is allocated with new connections or subscriptions (haven't asserted which yet) which doesn't appear to be released when the connection / subscription terminates.
The amount of memory is very small so is unlikely to be noticed in real-world situations. However, if for some reason, a client were to time out and reconnect with a high frequency, this can lead to a slow memory leak.
Important
Recommend investigating with cylc/cylc-flow#6985 merged in.
Steps To Reproduce
Configure a frequent websocket ping with a completely unreasonable timeout:
c.ServerApp.websocket_ping_interval = 3
c.ServerApp.websocket_ping_timeout = 0.000001
You will need to hack jupyter_server to allow you to use float values for the ping_timeout, here's the patch (will raise as a PR to jupyter_server in due course):
diff --git a/jupyter_server/serverapp.py b/jupyter_server/serverapp.py
index 1afbef4d0..53ecc04a7 100644
--- a/jupyter_server/serverapp.py
+++ b/jupyter_server/serverapp.py
@@ -1649,7 +1649,7 @@ class ServerApp(JupyterApp):
return "jupyter_server.gateway.connections.GatewayWebSocketConnection"
return ZMQChannelsWebsocketConnection
- websocket_ping_interval = Integer(
+ websocket_ping_interval = Float(
config=True,
help="""
Configure the websocket ping interval in seconds.
@@ -1666,12 +1666,16 @@ class ServerApp(JupyterApp):
the connection will be closed from the server side.
""",
)
- websocket_ping_timeout = Integer(
+ websocket_ping_timeout = Float(
config=True,
help="""
Configure the websocket ping timeout in seconds.
- See ``websocket_ping_interval`` for details.
+ The default is the same as the ping interval. Set to ``0`` to turn
+ the ping timeout off.
+
+ Note the the websocket ping timeout requires the ping interval to
+ be set, see ``websocket_ping_interval`` for details.
""",
)
This will cause the client to reconnect every ~3 seconds.
# launch the server
cylc gui --new
# connect the client in a new brower tab
# wait for the client to connect, then attach your profiler of choice
mprof attach <pid>
# wait for a few connect/disconnect cycles to pass, then collect your results
mprof plot <pid>

Profiling
Because the amount of memory leaked is so small, it is quite hard to trace.
The best approach I have come up with so far is to compare memory snapshots, the diff of which should contain the answer.
There are two tools I'm aware of which can do this:
tracemalloc.take_snapshot.compare_to(previous, 'lineno')
pympler.tracker.SummaryTracker.print_diff
Here's a patch which uses tracemalloc
:
diff --git a/cylc/uiserver/graphql/tornado_ws.py b/cylc/uiserver/graphql/tornado_ws.py
index fd58032..de24895 100644
--- a/cylc/uiserver/graphql/tornado_ws.py
+++ b/cylc/uiserver/graphql/tornado_ws.py
@@ -23,6 +23,7 @@
#
# It has been evolved to suit and ported to graphql-core v3.
+import tracemalloc
import asyncio
from asyncio.queues import QueueEmpty
from contextlib import suppress
@@ -46,6 +47,8 @@ from cylc.flow.network.graphql_subscribe import subscribe
from cylc.uiserver.authorise import AuthorizationMiddleware
from cylc.uiserver.schema import SUB_RESOLVER_MAPPING
+tracemalloc.start()
+SNAP = tracemalloc.take_snapshot()
NO_MSG_DELAY = 1.0
@@ -166,7 +169,7 @@ class TornadoSubscriptionServer:
):
self.schema = schema
self.loop = loop
- self.middleware = middleware
+ self.middleware = middleware or []
self.execution_context_class = execution_context_class
self.auth = auth
@@ -310,6 +313,25 @@ class TornadoSubscriptionServer:
async def on_start(self, connection_context, op_id, params):
# Attempt to unsubscribe first in case we already have a subscription
# with this id.
+ global SNAP
+ snapshot = tracemalloc.take_snapshot()
+ cmp = [
+ s
+ for s in snapshot.compare_to(SNAP, 'lineno')
+ if 'cylc/' in str(s)
+ # or 'tornado/' in str(s)
+ # or 'graphql/' in str(s)
+ ]
+ print()
+ print('+/-', sum(stat.size_diff for stat in cmp))
+ # with open('log', 'a') as log_file:
+ # log_file.write(str(sum(stat.size_diff for stat in cmp)) + '\n')
+ # log_file.flush()
+ for stat in sorted(cmp, key=lambda x: x.size_diff, reverse=True):
+ if stat.size_diff != 0:
+ print(f' {stat}')
+ print()
+ SNAP = snapshot
await connection_context.unsubscribe(op_id)
params['kwargs']['root_value'] = op_id
This diff is filtering memory allocated from within the Cylc codabase.
Note that memory diffs can be a bit jumpy, sometimes memory doesn't get deallocated for a while, but does eventually.
Remove the filter and uncomment the with open('log
bit to get the full memory diffs and log the net change to a file. Plot the result and you'll get something like this:

It jumps around, the leak is barely visible in the long term trend.
Leaks
The leak may not be in the Cylc codebase at all.
So far, I have found one minor leak originating from Cylc, however, this doesn't seem to make a big difference: cylc/cylc-flow#6985