-
Notifications
You must be signed in to change notification settings - Fork 133
Submit draft of RFC for ForEach-Object -Parallel proposal #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
052e515
b75a572
fb0017b
6b78263
c51c960
5592e87
f5cfefd
4596ac4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
--- | ||
RFC: RFCnnnn | ||
Author: Paul Higinbotham | ||
Status: Draft | ||
SupercededBy: N/A | ||
Version: 1.0 | ||
Area: Engine | ||
Comments Due: July 18, 2019 | ||
Plan to implement: Yes | ||
--- | ||
|
||
# PowerShell ForEach-Object -Parallel Cmdlet | ||
|
||
This RFC proposes a new parameter set for the existing ForEach-Object cmdlet to parallelize script block executions, instead of running them sequentially as it does now. | ||
|
||
## Motivation | ||
|
||
As a PowerShell User, | ||
I can do simple fan-out concurrency with the PowerShell ForEach-Object cmdlet, without having to obtain and load a separate module, or deal with PowerShell jobs unless I want to. | ||
|
||
|
||
## Specification | ||
|
||
There will be two new parameter sets added to the existing ForeEach-Object cmdlet to support both synchronous and asynchronous operations for parallel script block execution. | ||
For the synchronous case, the `ForEach-Object` cmdlet will not return until all parallel executions complete. | ||
For the asynchronous case, the `ForEach-Object` cmdlet will immediately return a PowerShell job object that contains child jobs of each parallel execution. | ||
|
||
### Implementation details | ||
|
||
|
||
Implementation will be similar to the ThreadJob module. | ||
Script block execution will be run for each piped input on a separate thread and runspace. | ||
|
||
The number of threads that run at a time will be limited by a `-ThrottleLimit` parameter with a default value. | ||
Piped input that exceeds the allowed number of threads will be queued until a thread is available. | ||
For synchronous operation, a `-Timeout` parameter will be available that terminates the wait for completion after a specified time. | ||
Without a `-Timeout` parameter, the cmdlet will wait indefinitely for completion. | ||
|
||
### Synchronous parameter set | ||
|
||
|
||
Synchronous ForEach-Object -Parallel returns after all script blocks complete running or timeout | ||
|
||
```powershell | ||
ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock {} | ||
``` | ||
|
||
- `-Parallel` : parameter switch specifies fan-out parallel script block execution | ||
|
||
- `-ThrottleLimit` : parameter takes an integer value that determines the maximum number threads | ||
|
||
- `-TimeoutSecs` : parameter takes an integer that specifies the maximum time to wait for completion in seconds | ||
|
||
|
||
### Asynchronous parameter set | ||
|
||
Asynchronous ForEach-Object -Parallel immediately returns a job object for monitoring parallel script block execution | ||
|
||
|
||
```powershell | ||
ForEach-Object -Parallel -ThrottleLimit 5 -AsJob -ScriptBlock {} | ||
``` | ||
|
||
- `-Parallel` : parameter switch specifies fan-out parallel script block execution | ||
|
||
- `-ThrottleLimit` : parameter takes an integer value that determines the maximum number threads | ||
|
||
- `-AsJob` : parameter switch returns a job object | ||
|
||
### Variable passing | ||
|
||
|
||
ForEach-Object -Parallel will support the PowerShell `$_` current piped item variable within each script block. | ||
It will also support the `$using:` directive for passing variables from script scope into the parallel executed script block scope. | ||
|
||
### Examples | ||
|
||
```powershell | ||
$computerNames = 'computer1','computer2','computer3','computer4','computer5' | ||
$logs = $computerNames | ForEach-Object -Parallel -ThrottleLimit 10 -TimeoutSecs 1800 -ScriptBlock { | ||
Get-Logs -ComputerName $_ | ||
} | ||
``` | ||
|
||
```powershell | ||
$computerNames = 'computer1','computer2','computer3','computer4','computer5' | ||
$job = ForEach-Object -Parallel -ThrottleLimit 10 -InputObject $computerNames -AsJob -ScriptBlock { | ||
Get-Logs -ComputerName $_ | ||
} | ||
$logs = $job | Wait-Job | Receive-Job | ||
``` | ||
|
||
```powershell | ||
$computerNames = 'computer1','computer2','computer3','computer4','computer5' | ||
$logNames = 'System','SQL' | ||
$logs = ForEach-Object -Parallel -InputObject $computerNames -ScriptBlock { | ||
Get-Logs -ComputerName $_ -LogNames $using:logNames | ||
} | ||
``` | ||
|
||
## Alternate Proposals and Considerations | ||
|
||
|
||
Another option (and a previous RFC proposal) is to resurrect the PowerShell Windows workflow script `foreach -parallel` keyword to be used in normal PowerShell script to perform parallel execution of foreach loop iterations. | ||
However, the majority of the community felt it would be more useful to update the existing ForeEach-Object cmdlet with a -parallel parameter set. | ||
We may want to eventually implement both solutions. | ||
But the ForEach-Object -Parallel proposal in this RFC should be implemented first since it is currently the most popular. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it complicates the ForEach-Object cmdlet too much and limits future development opportunities.
From previous discussion #174 (comment)
Yesterday I accidentally saw GNU utility like "parallel --pipe". Also we could find "Where-Object -Parallel" useful too. And "Select-String". And others. So this suggests that maybe we need something more general for pipe parallelization (Parallel-Pipe)
$hugeArray | Parallel-Pipe { Where-Object Name -eq "test" }
$hugeArray | Parallel-Pipe {Select-String -Pattern "test" }
$hugeArray | Parallel-Pipe -AsJob { ... }
A rose by any other name would smell as sweet:
$hugeArray | ForEach-Object -Parallel { $_ | Where-Object Name -eq "test" }
$hugeArray | ForEach-Object -Parallel { $_ | Select-String -Pattern "test" }
$hugeArray | ForEach-Object -Parallel -AsJob { ... }
I think in the name of single-purpose, composible cmdlets having a single parallel cmdlet (rather than trying to parallelise each individually) is the right way to go. But I think ForEach-Object has that brief of Select-like enumeration. And just like LINQ has .AsParallel(), I think -Parallel makes sense to do this. But that's just a brief and not-strongly-held opinion :)
There is a problem in depth with "ForEach-Object -Parallel" - there is a lot of parameters and how will "-Begin/Process/End" and etc work - in "ForEach-Object -Parallel" context or scriptblock context? I think we will have to duplicate parameters. In the case Parallel-Pipe is more simple and more safely.
Parallel-Pipe -InputOblect $data
-Begin { … }
-Process { Foreach-Object -Begin { … } -Process { … } -End { … } }
-End { … }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-Begin, -Process, -End switches will not be part of the new parameter set, since they don't make sense in this parallel case.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a meaning. Especially in binary cmdlets It should work just like in ForEach-Object (why do we have the blocks there?). Rather, there is a difficulty in implementation.
Current design looks like a workaround - it is simple implement but save all current limitations (see @alx9r comments below). My suggestion is to implement natively parallel pipelines that give significant flexibility, performance and extensibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does make sense to keep the
that exists in the sequential for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're removing
-Begin
,-Process
, and-End
then why are you proposing to overloadForEach-Object
instead of simply adding a new command? There's literally nothing else in this command, so you're really creating an entirely new command!ForEach-Object
is currently one of the worst performing cmdlets in PowerShell (compare$power = 1..1e5 | foreach { $_ * 2 }
to$power = 1..1e5 | & { process { $_ * 2 } }
) and although I doubt this change will make it slower, it will make it more complicated -- reducing the possibility of ever fixing it 🤨