Skip to content

Conversation

grilloandrea6
Copy link

I noticed that some operations in the current code fail when running without a CUDA-enabled GPU.

  • Timing: The implementation used torch.cuda.Event, which only works with CUDA. This has been updated to use the standard Python time module when CUDA is not available, ensuring correct timing on both CPU and GPU.

  • jit_forward method: The previous version used torch.matmul(W, input, out=input). On CPU, this causes incorrect results because the out tensor overlaps with an input operand. The fix replaces it with an assignment to a new tensor, which avoids overlapping writes.

The inplace torch.matmul does not give correct results on CPU
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant