Fix O(n) memory growth in PDF conversion by calling page.close() afte… by lesyk · Pull Request #1612 · microsoft/markitdown

lesyk · 2026-03-12T19:10:28Z

Problem
Fixes #1611. Since v0.1.5, converting large PDFs causes memory usage to grow linearly with page count (~1.1 MiB/page). A 400-page PDF consumes ~458 MiB instead of the expected ~7 MiB. The root cause is that pdfplumber's per-page cached properties (_rect_edges, _curve_edges, _edges, _objects, _layout) and get_textmap LRU cache are never freed during conversion.

Benchmark Results

Pages	Before (peak)	After (peak)	Saving
100	109.3 MiB	3.2 MiB	97%
200	225.6 MiB	4.5 MiB	98%
400	458.1 MiB	6.8 MiB	99%

…r each page

lesyk · 2026-03-12T19:11:47Z

Waiting for a verification from smarsou

…onsistency

…leaner code

smarsou · 2026-03-13T15:29:36Z

Awesome @lesyk !

My issue #1611 is fixed with this changes.

I'm not sure if I tested correctly : I just copy paste your new version of _pdf_converter.py into the markitdown lib (v0.1.5) in my venv, and it works perfectly ! Thanks a lot !

smarsou

Fixes #1611.

Fix O(n) memory growth in PDF conversion by calling page.close() afte…

478e4e2

…r each page

lesyk mentioned this pull request Mar 12, 2026

Memory usage regression in 0.1.5 during PDF text extraction #1611

Open

Refactor PDF memory optimization tests for improved readability and c…

ea42ae1

…onsistency

fafuxa-ms approved these changes Mar 12, 2026

View reviewed changes

lesyk added 2 commits March 12, 2026 21:15

Add memory benchmarking tests for PDF conversion with page.close() fix

bb38f16

Remove unnecessary blank lines in PDF memory optimization tests for c…

456d443

…leaner code

smarsou approved these changes Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix O(n) memory growth in PDF conversion by calling page.close() afte…#1612

Fix O(n) memory growth in PDF conversion by calling page.close() afte…#1612
lesyk wants to merge 4 commits intomicrosoft:mainfrom
lesyk:u/vilesyk/perf

lesyk commented Mar 12, 2026

Uh oh!

lesyk commented Mar 12, 2026

Uh oh!

smarsou commented Mar 13, 2026

Uh oh!

smarsou left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lesyk commented Mar 12, 2026

Uh oh!

lesyk commented Mar 12, 2026

Uh oh!

smarsou commented Mar 13, 2026

Uh oh!

smarsou left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants