Skip to content

fix: resolve silent freeze and progress bar issues in preprocessing pipeline on Windows#2993

Open
VAIDEHI-28 wants to merge 1 commit intoMIC-DKFZ:masterfrom
VAIDEHI-28:fix/preprocessing-freeze-windows
Open

fix: resolve silent freeze and progress bar issues in preprocessing pipeline on Windows#2993
VAIDEHI-28 wants to merge 1 commit intoMIC-DKFZ:masterfrom
VAIDEHI-28:fix/preprocessing-freeze-windows

Conversation

@VAIDEHI-28
Copy link
Copy Markdown
Contributor

Fixes #2729

Hey! I ran into this issue while setting up a Heart CT dataset on Windows and wanted
to dig into what was actually causing it. Turns out there were a few things going wrong
at the same time.

What was happening

When running nnUNetv2_plan_and_preprocess on Windows, the program would just freeze
silently after printing "Fingerprint extraction..." — no progress, no error, nothing.
You'd have to wait forever or force quit. On top of that, the progress bar was
behaving opposite to what you'd expect — hidden when --verbose was passed and
shown when it wasn't.

What I found

After digging through the code I found a few culprits:

  • The tqdm disable flag was inverted — disable=self.verbose instead of disable=not self.verbose
  • Worker results were being fetched twice with .get() which caused silent hangs
  • On Windows, libiomp5md.dll was being loaded twice by spawned workers, crashing them silently
  • There was no timeout — so if workers got stuck, the program just waited forever

What I changed

  • Fixed the inverted tqdm logic and added a descriptive label in fingerprint_extractor.py
  • Fixed the double .get() call in default_preprocessor.py
  • Added KMP_DUPLICATE_LIB_OK=TRUE before spawning workers to handle the Windows OMP issue
  • Added a 5-minute watchdog that raises a clear, actionable error if no progress is made
  • Improved the worker failure error message to actually tell you what to do
  • Removed the old commented-out code block that was no longer needed

Testing

Tested on Windows 11 with a dummy Heart CT dataset:

Before: OMP errors → silent freeze → RuntimeError crash with "6 feet under" message

After: Clean progress bars all the way through, full preprocessing completed
successfully for both 2d and 3d_fullres configurations with no errors.

@FabianIsensee
Copy link
Copy Markdown
Member

Thanks for the PR! I fully agree with the first two points you fixed (double .get() and inverted flag)!
For some datasets the timeout of 5 minutes may be too short. I would prefer this to be higher, but that's a simple change that I can also do on my end
What I am not knowledgeable about is the double import of the .dll file. Why does that happen, and what are the implications of that? If this is a windows-only issue I would expect a OS check to suppress this part in other systems.
Best,
Fabian

@FabianIsensee
Copy link
Copy Markdown
Member

I am not at all familiar with Windows and the duplicate imports, but codex seems to have some concerns with suppressing double import:

• KMP_DUPLICATE_LIB_OK=TRUE is risky as a production “fix” because it suppresses OpenMP duplicate-runtime errors instead of resolving the underlying conflict. It can turn a deterministic crash (for example OMP: Error #15) into non-deterministic behavior such as hangs, performance instability, or subtle correctness issues, especially under multiprocessing on Windows. In other words, it hides signal we need for diagnosis and may mask real packaging/runtime incompatibilities between dependencies. If kept at all, it should be narrowly scoped (Windows-only, explicit opt-in, documented as workaround), with follow-up to identify and remove the duplicate runtime at source.

@FabianIsensee FabianIsensee self-assigned this Mar 4, 2026
@FabianIsensee
Copy link
Copy Markdown
Member

Can you please isolate the two fixes (double .get() and inverted flag) into a separate PR while we continue the discussion on Windows imports? If you prefer I can also try to selectively push these improvements but your contribution would get lost in the process (which is why I would prefer you do a dedicated PR for those)

@VAIDEHI-28
Copy link
Copy Markdown
Contributor Author

Hi Fabian, thank you so much for reviewing this and for the detailed feedback!

Completely understood I'll isolate the double .get() fix and the inverted tqdm flag into a separate, clean PR right away so those improvements can be merged without the Windows import discussion blocking them.
I'll have the new PR up shortly!

@VAIDEHI-28
Copy link
Copy Markdown
Contributor Author

Hi Fabian, I've split the fixes into two separate PRs as requested:

Happy to make any further adjustments if needed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nnUNetv2_plan_and_preprocess stops without giving any error message

2 participants