-
Notifications
You must be signed in to change notification settings - Fork 26
Open
Description
The current hashing approach concatenates three SHA-1 hashes into a single string:
def _hash_csr_matrix(self, matrix):
return (hashlib.sha1(matrix.indices).hexdigest() +
hashlib.sha1(matrix.indptr).hexdigest() +
hashlib.sha1(matrix.data).hexdigest())
def _is_already_factorized(self, A):
if type(self.factorized_A) == str:
return self._hash_csr_matrix(A) == self.factorized_A
else:
return self._csr_matrix_equal(A, self.factorized_A)This always computes all three hashes and compares the full concatenated string.
We could store hashes as a tuple and use short-circuit evaluation as a minor optimization:
def _hash_array(self, array):
return hashlib.sha1(array).hexdigest()
def _hash_csr_matrix(self, matrix):
return (self._hash_array(matrix.indptr),
self._hash_array(matrix.indices),
self._hash_array(matrix.data))
def _is_already_factorized(self, A):
if type(self.factorized_A) == tuple:
hash_indptr, hash_indices, hash_data = self.factorized_A
return (
(self._hash_array(A.indptr) == hash_indptr) and
(self._hash_array(A.indices) == hash_indices) and
(self._hash_array(A.data) == hash_data)
)
else:
return self._csr_matrix_equal(A, self.factorized_A)Like the all check in _csr_matrix_equal, Python's and operator short-circuits; if indptr hashes don't match, it skips computing the other two hashes entirely. Comparing three strings should be cheaper than generating SHA-1 hashes.
I'll post benchmarks to verify the performance improvement.
Metadata
Metadata
Assignees
Labels
No labels