Skip to content

[BUG] Memory Leak in Data Parsers: Instance Methods Decorated with @lru_cache Retain Large DataFrames #723

@detecti1

Description

@detecti1

Describe the bug

When parsing multiple DataFrames within the same process, such as in a web server that receives user requests, calls pygwalker.to_html(df) to render a visualization, and then returns the result to the client, several instance methods and properties in the data parser classes are decorated with Python's functools.lru_cache. Because lru_cache uses the bound instance (self) as part of its cache key, these method-level caches end up storing strong references to each data parser object. Since each parser instance retains a reference to its associated DataFrame (which may be very large), this prevents both the parser and its DataFrame from being garbage collected after use. Over time, parsing several DataFrames in a session accumulates memory usage, leading to leaks and excessive memory consumption.

To Reproduce
Steps to reproduce the behavior:

  1. Create a large DataFrame (hundreds of MB).
  2. Call pygwalker.to_html(df) (or equivalent), creating a parser and triggering rendering.
  3. Remove references to the DataFrame and parser, then manually trigger garbage collection.
  4. Observe that process memory usage remains elevated; with each subsequent parse of a new DataFrame, memory usage grows.

Impact

  • Memory leaks occur simply by invoking pygwalker.to_html(df) multiple times on different DataFrames within the same process.
  • Long-running sessions parsing multiple DataFrames will retain unnecessary memory, with process RAM climbing over time.

Versions

  • pygwalker version: 0.4.9.15
  • python version: 3.10.17
  • browser

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions