A compatibility runtime to execute CUDA code on Apple Silicon chips by translating calls to Metal.
sillyCUDA is an open-source research project designed to bridge the gap between NVIDIA's proprietary CUDA ecosystem and Apple's Metal API. The goal is to build a translation layer that intercepts standard CUDA calls and executes them on Apple Silicon (M1/M2/M3/M4) GPUs using Metal Compute.
This project is primarily an educational journey into GPU architecture, driver development, and cross-platform compatibility. It is not intended to replace NVIDIA hardware for high-performance enterprise workloads, but rather to allow students, researchers, and "Mac enjoyers" to run and study CUDA kernels without needing a dedicated green team GPU.
Read the full introduction story on Medium: > Introducing sillyCUDA: My Quest to Learn CUDA as a Mac Enjoyer
sillyCUDA operates by mimicking the CUDA C API and translating logic in real-time. The architecture consists of three main components:
- The Runtime (
cuda_runtime_api):- Intercepts standard calls like
cudaMallocandcudaLaunchKernel. - Manages memory allocation via
MTLBufferinstead of NVIDIA drivers.
- Intercepts standard calls like
- The Translator:
- Parses CUDA kernels (C++ dialect).
- Generates an Abstract Syntax Tree (AST).
- Transpiles logic into valid Metal Shading Language (MSL).
- The Metal Backend:
- Handles the execution pipeline on Apple Silicon.
- Manages command queues and compute states.
Currently, the project is in the initial development phase.
- Phase 1: Mastering GPU Architecture & Basics (Vector Add, Matrix Mult).
- Phase 2: Native Metal reimplementation of CUDA concepts.
- Phase 3 (Current Goal): The "Vertical Slice" — getting a standard Vector Addition CUDA program to compile and run on macOS without modifying source code.
| Feature | Status |
|---|---|
Basic memory management (malloc, memcpy) |
✅ Planned |
| Kernel launches | ✅ Planned |
| Simple math operations | ✅ Planned |
| Texture memory | ❌ Out of scope |
| Advanced shared memory | ❌ Out of scope |
| Highly optimized cuBLAS libraries | ❌ Out of scope |
Target Performance: ~50% of native Metal execution speed.
Requirements: macOS 13+, Xcode Command Line Tools, CMake.
# Clone the repository
git clone [https://github.com/lwi00/sillyCUDA.git](https://github.com/lwi00/sillyCUDA.git)
cd sillyCUDA
# Create build directory
mkdir build && cd build
# Configure and build
cmake ..
make
This is an open journey! Whether you are a student learning compilers, a graphics engineer, contributions are welcome.
-
Report Bugs: Open an issue if you find something broken (there will be many).
-
Discuss: Share ideas on how to improve the translation layer.
This project is licensed under the Apache 2.0 License. See the LICENSE file for details.
Made with ❤️ and ☕️ by lwi00.