-
Notifications
You must be signed in to change notification settings - Fork 0
Async task parallelization #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 19540738427Details
💛 - Coveralls |
aeddins-ibm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the speedup in those lightcone plots at the top of the PR!
Commenting even though this is WIP. I'm suspicious of the use of nonlocal seemingly to share write access to an object across processes, but OTOH it's apparently working so far...
Co-authored-by: aeddins-ibm <[email protected]>
|
For reerence purposes, this is what the logging output now looks like: |
This refactors the parallelization of the unequal time commutator bound computation.
The parallelization is still done using a
multiprocessing.Poolwith a user-configurable number of processes and maximum computation timeout.However, prior to this PR the code was processing the circuit layer-by-layer, where the bound computation of all noise terms appearing in a single layer have to complete before moving on to the next one. This can hinder performance when only a few of these terms take a long time to compute during which time the remanining processes end up idling.
This PR updates the implementation to submit the computation of all noise term bounds across all layers as asynchronous tasks to be processed in parallel. This avoids the bottleneck above as pending tasks can continue to be processed even when long running ones block single resources.
This allowed a tets computation of the forward bounds on a mirrored time evolution using 20 Trotter steps on a 40 qubit kicked Ising Hamiltonian to complete entirely within 55 minutes on 96 parallel processes while the old implementation timed out after processing 54 of the 80 layers.
Before this PR can be merged, I still need to address the following:
(opt.) add some unittesting of the timeout behavior