-
Notifications
You must be signed in to change notification settings - Fork 14.1k
CUDA: Conv2d tensor core #16828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: Conv2d tensor core #16828
Changes from 13 commits
19596b1
96db627
2cd9fb0
d633cee
ac5e0c0
410171a
4ae58ad
51f85ff
6049576
cc3d366
c7259fa
1809814
e3f94c6
e1ab1f0
2e1c881
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
| @@ -0,0 +1,373 @@ | ||||
| #include "common.cuh" | ||||
| #include "conv2d-tensor-core.cuh" | ||||
| #include "convert.cuh" | ||||
| #include "mma.cuh" | ||||
|
|
||||
| #define CEIL_DIV(M, N) (((M) + (N) - 1) / (N)) | ||||
|
|
||||
| static uint32_t ceil_div(uint32_t M, uint32_t N); | ||||
|
||||
| constexpr size_t ceil_div(const size_t m, const size_t n) { |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already exists in common.
llama.cpp/ggml/src/ggml-cuda/common.cuh
Line 653 in 1ae7488
| static __device__ __forceinline__ uint32_t fastdiv(uint32_t n, const uint3 fastdiv_values) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove makro, and use function instead.