-
-
Notifications
You must be signed in to change notification settings - Fork 786
Open
Open
Copy link
Description
System Info
- Python: 3.10.18
- bitsandbytes: 0.47.0
- CPU: Intel(R) Xeon(R) CPU E5-2682 v4 @ 2.50GHz
- System: centos 7
Reproduction
from bitsandbytes.functional import quantize_blockwise, create_dynamic_map
import torch
if __name__ == "__main__":
block_size = 256
# success
tensor = torch.rand(10000)
tensor[: block_size - 1] = 0
state, quant_state = quantize_blockwise(A=tensor, blocksize=block_size)
print(state)
# success
state, quant_state = quantize_blockwise(A=tensor.cuda(), blocksize=block_size)
print(state)
# success
tensor = torch.rand(10000)
tensor[:block_size] = 0
state, quant_state = quantize_blockwise(A=tensor.cuda(), blocksize=block_size)
print(state)
# Segmentation fault (core dumped)
state, quant_state = quantize_blockwise(A=tensor, blocksize=block_size)
print(state)

Expected behavior
When all block elements are zero, the quantize-blockwise function can run successfully on the GPU, but segmentation errors will occur on the CPU.