Spark Driver memory Leak? #7060

tgravescs · 2025-05-13T14:34:19Z

tgravescs
May 13, 2025

Hello,

I am using Kyuubi with Spark on k8s and using python notebooks. We are launching a lot of different queries from multiple different processes over a fairly long period of time and we are seeing the Spark driver memory usage keep increasing. Eventually it starts to GC alot and then OOM.

I got a heap dump of the Spark Driver and it looks like the py4j.Gateway has a concurrentHashMap that is just accumulating objects - Spark Dataset, Atrributes, Expressions, etc. Basically looks like anything that the python process might be referencing on the java side.

From my understanding these are supposed to get cleaned up when the python process does a garbage collection. It seems like that isn't happening here but I'm not positive. I'm wondering if the kyuubi execute_python process is exiting without these being cleaned up.

Has anyone else run into this issue? If so is there any known fixes?

Thanks

HaoYang670 · 2025-05-14T02:51:27Z

HaoYang670
May 14, 2025

@pan3793 any insight?

0 replies

pan3793 · 2025-05-15T14:25:42Z

pan3793
May 15, 2025
Collaborator

... these are supposed to get cleaned up when the python process does a garbage collection. It seems like that isn't happening ...

Have a quick look at the relative code, the analysis seems right. I'm not familiar with the PySpark stuff, maybe we need to learn something from Connect Server, how does it clean up those resources when a session that accesses Python resources closes?

1 reply

HaoYang670 May 16, 2025

All python resources are on client side for Spark Connect.

tgravescs · 2025-05-15T16:08:16Z

tgravescs
May 15, 2025
Author

I've looked at a lot at this the last couple days. The python objects that don't seem to be cleaned up at the ones run in the user code. where execute_python.py does the compile and then exec(). The ones initialized and used in the execute_python.py script tiself seem to be cleaned up ok.
i'm not quite sure why though. I also tried to manually call gc.collect() in the user code without any luck so far. it would be really nice to understand what is going on.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Spark Driver memory Leak? #7060

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Spark Driver memory Leak? #7060

Uh oh!

tgravescs May 13, 2025

Replies: 3 comments · 1 reply

Uh oh!

HaoYang670 May 14, 2025

Uh oh!

pan3793 May 15, 2025 Collaborator

Uh oh!

HaoYang670 May 16, 2025

Uh oh!

tgravescs May 15, 2025 Author

tgravescs
May 13, 2025

Replies: 3 comments 1 reply

HaoYang670
May 14, 2025

pan3793
May 15, 2025
Collaborator

tgravescs
May 15, 2025
Author