-
Notifications
You must be signed in to change notification settings - Fork 852
Description
Describe the bug
When parsing multiple DataFrames within the same process, such as in a web server that receives user requests, calls pygwalker.to_html(df) to render a visualization, and then returns the result to the client, several instance methods and properties in the data parser classes are decorated with Python's functools.lru_cache. Because lru_cache uses the bound instance (self) as part of its cache key, these method-level caches end up storing strong references to each data parser object. Since each parser instance retains a reference to its associated DataFrame (which may be very large), this prevents both the parser and its DataFrame from being garbage collected after use. Over time, parsing several DataFrames in a session accumulates memory usage, leading to leaks and excessive memory consumption.
To Reproduce
Steps to reproduce the behavior:
- Create a large DataFrame (hundreds of MB).
- Call
pygwalker.to_html(df)(or equivalent), creating a parser and triggering rendering. - Remove references to the DataFrame and parser, then manually trigger garbage collection.
- Observe that process memory usage remains elevated; with each subsequent parse of a new DataFrame, memory usage grows.
Impact
- Memory leaks occur simply by invoking
pygwalker.to_html(df)multiple times on different DataFrames within the same process. - Long-running sessions parsing multiple DataFrames will retain unnecessary memory, with process RAM climbing over time.
Versions
- pygwalker version: 0.4.9.15
- python version: 3.10.17
- browser