I have 100,000 CWA files to process, but parallel execution is slower than serial #1441

blitz305 · 2025-12-24T08:00:56Z

blitz305
Dec 24, 2025

When I test the speed using only one CWA file, it takes about 5 minutes to complete steps 1–6, which is acceptable.
However, when I enable 44 cores to process 6,900 CWA files, more than four hours have passed, and I only obtained 20 files under meat/basic. This is extremely slow.

Why is parallel execution so much slower than the single-file test?

Answered by vincentvanhees

Dec 29, 2025

I am reopening this discussion and turning it into an issue. It is not ideal that serial processing or massive reduction in batch size are considered the only solutions. Some thoughts:

Maybe GGIR should reserve not just one core per input file when running GGIR part 1 but use 2 or 3 cores.
Documentation needs to better cover this challenge.
In relation to UK Biobank: I am wondering whether it would help if they had the processed output of part 1 available, rather than expecting everyone to running the exact same time consuming process on the CWA files, wasting time and computing resources. Possibility making a new release once every two years.

View full answer

vincentvanhees · 2025-12-24T09:19:12Z

vincentvanhees
Dec 24, 2025
Maintainer

The amount of memory per core is probably the issue. If it is not sufficient it can slow down GGIR substantially. I wrote a paragraph about processing time in the documentation:
https://wadpac.github.io/GGIR/articles/chapter2_Pipeline.html?q=speed#processing-time

5 replies

blitz305 Dec 25, 2025
Author

Thank you for the response, but I am not sure it fully explains the behavior I am observing.
The machine has 190 GB of RAM in total, and I configured maxcores = 44, so the memory available per core should theoretically be sufficient. I am therefore wondering whether other factors

vincentvanhees Dec 28, 2025
Maintainer

Parallel processing has worked well for many GGIR users. I suspect the issue is specific to your setup. Could you investigate it yourself? For example, by writing your own code to simulate the problem outside GGIR via a foreach loop that does dummy computations, and by replicating your problem on a different computer with less cores? Also you may want to search online for similar cases?

Alternatively, I am happy to research the issue for you as a paid consultancy... GGIR is for free, but my time unfortunately not. If you are interested, please send me a private message via v.vanhees at accelting dot com.

blitz305 Dec 29, 2025
Author

Thank you for your response. Just to clarify, I was using default parameters for everything except for setting mode = c(1, 2, 3, 4, 5). To move forward, I have already switched to processing the tasks serially across multiple machines. Thank you again for your time and help.

vincentvanhees Dec 29, 2025
Maintainer

I was refering tot the setup of your machine and not the setup of GGIR. The nummer of cores and amount of memory you have is not what most GGIR users have.

blitz305 Dec 29, 2025
Author

My apologies, I see what you mean now. I am using a server on the UKB RAP (UK Biobank Research Analysis Platform), so there might be some specific environment configurations or platform-level constraints that I'm not fully aware of. I will look into the platform settings further. Thank you again for your help and for pointing me in the right direction

j262byuu · 2025-12-28T23:15:44Z

j262byuu
Dec 28, 2025

I suggest reducing your batch size from 6900 to around 1000. Your current batch size is likely causing memory accumulation that exhausts the 190 G RAM. In GGIR part1, the parallel worker nodes persist throughout the entire foreach loop. R struggles with memory fragmentation over long runs, so processing 6900/44=157 files per core allows this bloat to accumulate until the system starts swapping to disk, which triggers the massive slowdown.

1 reply

blitz305 Dec 29, 2025
Author

Thanks for the help! I actually tried reducing the batch size to as low as 50 previously, but the performance issue persisted. I have since resolved this by switching to a serial workflow across multiple machines. I appreciate your insights on the memory accumulation though!

vincentvanhees · 2025-12-29T14:29:41Z

vincentvanhees
Dec 29, 2025
Maintainer

I am reopening this discussion and turning it into an issue. It is not ideal that serial processing or massive reduction in batch size are considered the only solutions. Some thoughts:

Maybe GGIR should reserve not just one core per input file when running GGIR part 1 but use 2 or 3 cores.
Documentation needs to better cover this challenge.
In relation to UK Biobank: I am wondering whether it would help if they had the processed output of part 1 available, rather than expecting everyone to running the exact same time consuming process on the CWA files, wasting time and computing resources. Possibility making a new release once every two years.

0 replies

I have 100,000 CWA files to process, but parallel execution is slower than serial #1441

Uh oh!

blitz305 Dec 24, 2025

Replies: 3 comments · 6 replies

Uh oh!

vincentvanhees Dec 24, 2025 Maintainer

Uh oh!

blitz305 Dec 25, 2025 Author

Uh oh!

vincentvanhees Dec 28, 2025 Maintainer

Uh oh!

blitz305 Dec 29, 2025 Author

Uh oh!

vincentvanhees Dec 29, 2025 Maintainer

Uh oh!

blitz305 Dec 29, 2025 Author

Uh oh!

Uh oh!

j262byuu Dec 28, 2025

Uh oh!

blitz305 Dec 29, 2025 Author

Uh oh!

vincentvanhees Dec 29, 2025 Maintainer

blitz305
Dec 24, 2025

Replies: 3 comments 6 replies

vincentvanhees
Dec 24, 2025
Maintainer

blitz305 Dec 25, 2025
Author

vincentvanhees Dec 28, 2025
Maintainer

blitz305 Dec 29, 2025
Author

vincentvanhees Dec 29, 2025
Maintainer

blitz305 Dec 29, 2025
Author

j262byuu
Dec 28, 2025

blitz305 Dec 29, 2025
Author

vincentvanhees
Dec 29, 2025
Maintainer