Replies: 3 comments 1 reply
-
@pan3793 any insight? |
Beta Was this translation helpful? Give feedback.
-
Have a quick look at the relative code, the analysis seems right. I'm not familiar with the PySpark stuff, maybe we need to learn something from Connect Server, how does it clean up those resources when a session that accesses Python resources closes? |
Beta Was this translation helpful? Give feedback.
-
I've looked at a lot at this the last couple days. The python objects that don't seem to be cleaned up at the ones run in the user code. where execute_python.py does the compile and then exec(). The ones initialized and used in the execute_python.py script tiself seem to be cleaned up ok. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I am using Kyuubi with Spark on k8s and using python notebooks. We are launching a lot of different queries from multiple different processes over a fairly long period of time and we are seeing the Spark driver memory usage keep increasing. Eventually it starts to GC alot and then OOM.
I got a heap dump of the Spark Driver and it looks like the py4j.Gateway has a concurrentHashMap that is just accumulating objects - Spark Dataset, Atrributes, Expressions, etc. Basically looks like anything that the python process might be referencing on the java side.
From my understanding these are supposed to get cleaned up when the python process does a garbage collection. It seems like that isn't happening here but I'm not positive. I'm wondering if the kyuubi execute_python process is exiting without these being cleaned up.
Has anyone else run into this issue? If so is there any known fixes?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions