Hi, I’d like to suggest a minor performance improvement in the following snippet:
a = np.array([0 if c.isupper() or c == '-' else 1 for c in line])
if np.sum(a) > 0:
This can be slightly optimized as:
a = np.array([0 if c.isupper() or c == '-' else 1 for c in line])
if a.sum() > 0:
Since a is already a NumPy ndarray, calling np.sum(a) introduces redundant dispatching logic. NumPy's top-level np.sum function first performs input validation, type inference, and potential delegation to custom array types (via array_function). This layer exists to support flexibility but incurs extra overhead. In contrast, a.sum() directly calls the array’s built-in C-level method (PyArray_Sum), eliminating unnecessary checks and maximizing performance. When the array type is known and fixed, the direct method is preferred for clarity and efficiency.
protein_generator/model/parsers.py
Line 56 in 94b13b0
Hi, I’d like to suggest a minor performance improvement in the following snippet:
This can be slightly optimized as:
Since a is already a NumPy ndarray, calling np.sum(a) introduces redundant dispatching logic. NumPy's top-level np.sum function first performs input validation, type inference, and potential delegation to custom array types (via array_function). This layer exists to support flexibility but incurs extra overhead. In contrast, a.sum() directly calls the array’s built-in C-level method (PyArray_Sum), eliminating unnecessary checks and maximizing performance. When the array type is known and fixed, the direct method is preferred for clarity and efficiency.