-
Notifications
You must be signed in to change notification settings - Fork 70
Description
When running long-lived iterative queries from Node.js against three backends (DuckDB only, DuckDB < SQLite, DuckDB < Postgres), RSS grows linearly only in the Postgres path (into multiple GB), while V8 heapUsed stays ~≤100 MB and looks similar across all modes. This suggests growth in native memory outside the JS heap, likely in the Postgres scanner path.
I few month's back I theorized that the issue was on the postgres scanner over https://www.reddit.com/r/DuckDB/comments/1itnxxv/duckdb_memory_growth_patterns_in_longrunning/ where I saw many allocated pages over a period of time, but only today I tested with different drivers.
Environment
- OS: Ubuntu/Debian (x86_64)
- Node.js:
node -v= v24.3.0 - duckdb-node package:
npm ls duckdb= 1.2.0-alpha.14 - PostgreSQL server: PostgreSQL 17.6 (Ubuntu 17.6-1.pgdg22.04+1)
Repro repo / script
https://github.com/renatocron/duckdb-memory-leak-test
- Repo contains
memory_test.js(runs the same analytical query across modes) and produces large JSON logs with memory stats. - Also included:
mem_compare.pyto align/downsample and plot/emit a reduced JSON for web.
How to reproduce
# 1) Install deps
npm install
# 2) Create Postgres DB (tool populates tables automatically)
createdb duckdb_memory_test # or use psql as in the README
# 3) Run tests (examples)
# a) Postgres path (suspected leak):
TEST_MODE=postgres QUERY_INTERVAL=1000 MAX_ITERATIONS=0 \
NEW_CONNECTION=false FORCE_GC=false \
node --expose-gc memory_test.js
# b) DuckDB-only baseline:
TEST_MODE=duckdb node --expose-gc memory_test.js
# c) SQLite scanner:
TEST_MODE=sqlite node --expose-gc memory_test.js
# 4) Compare results
./mem_compare.py memory_stats_*.json --output-plot plot.png --target-points 400
# (Optional) Interactive web view:
./mem_compare.py memory_stats_*.json --output-json reduced.json
python3 -m http.server 8080Observed behavior
-
RSS (MB):
- Postgres (orange): steady linear growth up to ~2.4 GB across ~450k iterations.
- DuckDB (blue): plateaus ~250–350 MB with occasional steps.
- SQLite (green): sawtooth between ~300–1000 MB, but not linear unbounded.
-
V8 heapUsed (MB): slow, similar growth across all modes (~7 → 90 MB).
Expected behavior
- RSS should plateau or oscillate within a bounded range once steady-state is reached, similar to DuckDB-only runs. Large, iteration-proportional native memory growth in the Postgres path is unexpected.
Artifacts
-
PNG plots (aligned to common max iteration):
- RSS vs iteration: (attached as
plot.png) - heapUsed vs iteration: (attached as
heap_used.png)
- RSS vs iteration: (attached as
-
Raw logs:
memory_stats_postgres_*.json,memory_stats_sqlite_*.json,memory_stats_duckdb_*.json(each ~96 MB) if iter =~ 500k- If desired, maintainers can use
mem_compare.py ... --output-json reduced.jsonfor a tiny repro dataset.
- If desired, maintainers can use
Notes / hypotheses
- Growth seems native (outside V8).
- Candidate areas: Postgres scanner result materialization, Arrow/Vector buffers, lifetimes of
DataChunk/ColumnDataCollection, libpq result handling, prepared statement cache, or per-iteration allocations not released. - DuckDB memory limit is set (e.g.,
500MB), but RSS for the Postgres path exceeds it over time, suggesting allocations not accounted in DuckDB’s limit or outside its allocator.
Things I can try if helpful
- Repeat using DuckDB CLI (no Node) with
INSTALL/LOAD postgres;and a tight loop of the same query to isolate Node.
Any tool to help this? As running ~ 500k manually seens not feasible
- Swap allocator to jemalloc and dump stats; try
MALLOC_CONF=prof:true,prof_active:true.
Tried on the production worker, did not changed the behavior
Thank you!
Happy to run any diagnostic builds or patches.