-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Here's a code snippet in pygsti/optimize/customsolve.py on the head of master.
pyGSTi/pygsti/optimize/customsolve.py
Lines 157 to 168 in c74fcb2
smbuf1b = _np.zeros(1, _np.int64) | |
smbuf2 = _np.zeros(2, _np.int64) | |
smbuf3 = _np.zeros(3, _np.int64) | |
for icol in range(a.shape[1]): | |
# Step 1: find the index of the row that is the best pivot. | |
# each proc looks for its best pivot (Note: it should not consider rows already pivoted on) | |
potential_pivot_indices = all_row_indices[potential_pivot_mask] | |
ibest_global, ibest_local, h, k = _find_pivot(a, b, icol, potential_pivot_indices, my_row_slice, | |
shared_floats, shared_ints, resource_alloc, comm, host_comm, | |
smbuf1, smbuf2, smbuf3, host_index_buf, host_val_buf) |
The smbuf1b variable is never used. It looks like it should be passed as an argument to _find_pivot, after smbuf1, per the signature below (variable names in the signature are just buf1
and buf1b
):
pyGSTi/pygsti/optimize/customsolve.py
Lines 213 to 214 in c74fcb2
def _find_pivot(a, b, icol, potential_pivot_inds, my_row_slice, shared_floats, shared_ints, | |
resource_alloc, comm, host_comm, buf1, buf1b, buf2, buf3, best_host_indices, best_host_vals): |
As a consequence of smbuf1b not being passed to _find_pivot, we're calling _find_pivot with only 15 positional arguments when it requires 16.
The appearance of this bug is uncommon because there's a dedicated code path for handling linear solves with matrices of order < 10,000 (below) that gets triggered in our testing and even in most large GST runs:
pyGSTi/pygsti/optimize/customsolve.py
Lines 98 to 126 in c74fcb2
if comm.size < proc_threshold and a.shape[1] < 10000: | |
# We're not exactly sure where scipy is better, but until we speed up / change gaussian-elim | |
# alg the scipy alg is much faster for small numbers of procs and so should be used unless | |
# A is too large to be gathered to the root proc. | |
global_a, a_shm = ari.gather_jtj(a, return_shared=True) | |
global_b, b_shm = ari.gather_jtf(b, return_shared=True) | |
#global_a = ari.gather_jtj(a) | |
#global_b = ari.gather_jtf(b) | |
if comm.rank == 0: | |
try: | |
global_x = _scipy.linalg.solve(global_a, global_b, assume_a='pos') | |
ok_buf[0] = 1 # ok | |
except _scipy.linalg.LinAlgError as e: | |
ok_buf[0] = 0 # failure! | |
err = e | |
else: | |
global_x = None | |
err = _scipy.linalg.LinAlgError("Linear solver fail on root proc!") # just in case... | |
comm.Bcast(ok_buf, root=0) | |
if ok_buf[0] == 0: | |
_smt.cleanup_shared_ndarray(a_shm) | |
_smt.cleanup_shared_ndarray(b_shm) | |
raise err # all procs must raise in sync | |
ari.scatter_x(global_x, x) | |
_smt.cleanup_shared_ndarray(a_shm) | |
_smt.cleanup_shared_ndarray(b_shm) | |
return |
Possible ways to move forward:
- Temporary. Just change the dimension threshold to 100,000 instead of 10,000. Storing one double-precision matrix of this size requires ~80GB. The overhead of the Python call means this will require 160 GB of "free" RAM to run. That's a reasonable amount for modern compute servers.
- Try and fix the bug in the custom Gaussian elimination code. Have the branch on matrix size check a user-modifiable constant instead of a hard coded value. Some of our tests can run this code where that matrix size threshold is quite small.
- Ambitious. Create a new ArrayInterface class that relies on Dask arrays. Replace the custom Gaussian elimination code with calls to Dask's distributed linear algebra functions: https://docs.dask.org/en/stable/_modules/dask/array/linalg.html.