Skip to content

Add parallel Xlsx writing via pcntl_fork#4834

Open
kemo wants to merge 3 commits intoPHPOffice:masterfrom
kemo:multithread
Open

Add parallel Xlsx writing via pcntl_fork#4834
kemo wants to merge 3 commits intoPHPOffice:masterfrom
kemo:multithread

Conversation

@kemo
Copy link
Copy Markdown
Contributor

@kemo kemo commented Mar 11, 2026

Summary

  • Adds opt-in parallel processing infrastructure (src/PhpSpreadsheet/Parallel/) using pcntl_fork() with automatic sequential fallback
  • Parallelizes per-sheet XML generation in Writer\Xlsx::save() — the single most CPU-intensive phase of writing
  • Zero overhead when disabled (default) — no forking, no temp files, identical code path to before
  • Safely degrades: 2-core machines auto-detect 1 worker → sequential path, no fork attempted

Architecture

Component Purpose
ParallelExecutor Public API: map(tasks, worker) with auto-backend detection, memory safety checks
PcntlBackend pcntl_fork() + temp-file IPC, batched execution, timeout + zombie reaping
SequentialBackend No-op fallback, always available
CpuDetector Multi-strategy CPU count detection (pcntl_cpu_count → /proc/cpuinfo → sysctl → nproc → fallback 2), cached

Writer instance API

Parallel settings live on the Writer\Xlsx instance (not global statics):

$writer = new \PhpOffice\PhpSpreadsheet\Writer\Xlsx($spreadsheet);
$writer->setParallelEnabled(true);
$writer->setMaxWorkers(4); // Optional: null = auto-detect based on CPU count
$writer->save('output.xlsx');

Worker count formula

min(cpuCount - 1, taskCount, 8) — reserves 1 core for parent, caps at 8 to prevent IPC overhead.

Safety

  • Child process isolation: $isChild flag prevents children from running parent cleanup (could delete unread temp files)
  • Memory limit check: Estimates 30% dirty-page overhead per fork, auto-reduces worker count if tight
  • Timeout: Configurable per-task timeout (default 60s), parent sends SIGTERM to hung children
  • Zombie prevention: pcntl_waitpid() in finally block reaps all children, even on exceptions
  • posix_kill guarded: function_exists() check, numeric signal value (15) avoids undefined constant

Benchmarks (Apple M2 Pro, 11 cores)

Workload Sequential Parallel Speedup
5 sheets × 5K rows (250K cells) 2.64s 1.82s 1.45x
10 sheets × 5K rows (500K cells) 6.66s 4.07s 1.63x

Speedup is bounded by ZIP compression (~45% of save() time) which runs sequentially after XML generation (~48%).

Test plan

  • 44 tests, 579 assertions — all passing
  • PHPStan level 10 clean (0 new errors)
  • Platform-specific and defensive error paths annotated with @codeCoverageIgnore
  • ParallelXlsxWriterTest: validates round-trip read-back of parallel-written files
  • ParallelXlsxWriterTest: sequential vs parallel produce identical cell data
  • ParallelXlsxWriterTest: single-sheet parallel-enabled falls back to sequential
  • Timeout test: verifies hung children are killed and exception propagated
  • Error propagation test: child exceptions surface in parent
  • Memory limit parsing: covers -1, M, G suffix formats
  • Xlsx writer parallel setter/getter coverage

@oleibman
Copy link
Copy Markdown
Collaborator

I have for some time been actively engaged in eliminating static settings in favor of instance instance properties, to make testing easier and to avoid problems in shared environments. The effort is far from finished, but I'd still like to avoid static when possible. Here, you could make your new switches an instance variable of the writer rather than a global static setting. Would you be able to make that change?

@kemo kemo force-pushed the multithread branch 2 times, most recently from e6e8be5 to 3ce4ece Compare March 12, 2026 10:53
Comment thread samples/Chart/32_Chart_read_write.php Outdated
$inputFileType = 'Xlsx';
$inputFileNames = __DIR__ . '/../templates/32readwrite*[0-9].xlsx';

/** @var string[] $argv */
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is unrelated to the parallel processing changes — it was a stray static analysis suppression. I'll remove it.

@kemo
Copy link
Copy Markdown
Contributor Author

kemo commented Mar 24, 2026

parallelEnabled and maxWorkers are instance properties on the Xlsx writer, not static settings. The writer is configured per-instance via setParallelEnabled() and setMaxWorkers().

@kemo kemo requested a review from oleibman March 25, 2026 15:41
@ddevsr
Copy link
Copy Markdown
Contributor

ddevsr commented Apr 20, 2026

@kemo You can use fidry/cpu-core-counter instead for complexity detection CPU like virtualhost. That library used by PHPStan, Rector, PHP-CS-Fixer for parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants