Cutlass_EX

0. Introduction

Goal : Development of a 4-bit primitives kernels by using Cutlass

1. Example List

example_1) custom code with CUTLASS

example_2) cutlass::uint4b_t

example_3) single-precision gemm template

00_basic_gemm
This is kernel computes the general matrix product (GEMM) using single-precision floating-point arithmetic and assumes all matrices have column-major layout.

example_4) mixed-precision gemm template with cutlass utilities

01_cutlass_utilities
These utilities are intended to be useful supporting components for managing tensor and matrix memory allocations, initializing and comparing results, and computing reference output.

example_5) CUTLASS debugging tool

02_dump_reg_shmem
Demonstrate CUTLASS debugging tool for dumping fragments and shared memory
dumping : Record the state of memory at a specific point in time

example_6) CUTLASS layout visualization example

03_visualize_layout

example_7) CUTLASS example to compute a batched strided gemm in two different ways

05_batched_gemm
strided batched gemm : By specifying pointers to the first matrices of the batch and the stride between the consecutive matrices of the batch.
array gemm : By copying pointers to all matrices of the batch to the device memory.

example_11) Handling Cutlass Tensors

example_12) Simple CUTLASS convolution using Tensor core

2. Guide

    cd example_{number}
    mkdir build
    cd build
    cmake ..
    make
    ./main

3 Reference

Cutlass : https://github.com/NVIDIA/cutlass

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cutlass_EX

0. Introduction

1. Example List

example_1) custom code with CUTLASS

example_2) cutlass::uint4b_t

example_3) single-precision gemm template

example_4) mixed-precision gemm template with cutlass utilities

example_5) CUTLASS debugging tool

example_6) CUTLASS layout visualization example

example_7) CUTLASS example to compute a batched strided gemm in two different ways

example_8) CUTLASS turing gemm using tensor cores

example_9) CUTLASS turing convolution using tensor cores

example_10) CUTLASS ampere convolution using tensor cores

example_11) Handling Cutlass Tensors

example_12) Simple CUTLASS convolution using Tensor core

2. Guide

3 Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
example_1		example_1
example_10		example_10
example_11		example_11
example_12		example_12
example_2		example_2
example_3		example_3
example_4		example_4
example_5		example_5
example_6		example_6
example_7		example_7
example_8		example_8
example_9		example_9
layout_0		layout_0
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

yester31/Cutlass_EX

Folders and files

Latest commit

History

Repository files navigation

Cutlass_EX

0. Introduction

1. Example List

example_1) custom code with CUTLASS

example_2) cutlass::uint4b_t

example_3) single-precision gemm template

example_4) mixed-precision gemm template with cutlass utilities

example_5) CUTLASS debugging tool

example_6) CUTLASS layout visualization example

example_7) CUTLASS example to compute a batched strided gemm in two different ways

example_8) CUTLASS turing gemm using tensor cores

example_9) CUTLASS turing convolution using tensor cores

example_10) CUTLASS ampere convolution using tensor cores

example_11) Handling Cutlass Tensors

example_12) Simple CUTLASS convolution using Tensor core

2. Guide

3 Reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages