I have 100,000 CWA files to process, but parallel execution is slower than serial #1441
-
|
When I test the speed using only one CWA file, it takes about 5 minutes to complete steps 1–6, which is acceptable. Why is parallel execution so much slower than the single-file test? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
|
The amount of memory per core is probably the issue. If it is not sufficient it can slow down GGIR substantially. I wrote a paragraph about processing time in the documentation: |
Beta Was this translation helpful? Give feedback.
-
|
I suggest reducing your batch size from 6900 to around 1000. Your current batch size is likely causing memory accumulation that exhausts the 190 G RAM. In GGIR part1, the parallel worker nodes persist throughout the entire foreach loop. R struggles with memory fragmentation over long runs, so processing 6900/44=157 files per core allows this bloat to accumulate until the system starts swapping to disk, which triggers the massive slowdown. |
Beta Was this translation helpful? Give feedback.
-
|
I am reopening this discussion and turning it into an issue. It is not ideal that serial processing or massive reduction in batch size are considered the only solutions. Some thoughts:
|
Beta Was this translation helpful? Give feedback.
I am reopening this discussion and turning it into an issue. It is not ideal that serial processing or massive reduction in batch size are considered the only solutions. Some thoughts:
Maybe GGIR should reserve not just one core per input file when running GGIR part 1 but use 2 or 3 cores.
Documentation needs to better cover this challenge.
In relation to UK Biobank: I am wondering whether it would help if they had the processed output of part 1 available, rather than expecting everyone to running the exact same time consuming process on the CWA files, wasting time and computing resources. Possibility making a new release once every two years.