sillyCUDA

A compatibility runtime to execute CUDA code on Apple Silicon chips by translating calls to Metal.

📖 About

sillyCUDA is an open-source research project designed to bridge the gap between NVIDIA's proprietary CUDA ecosystem and Apple's Metal API. The goal is to build a translation layer that intercepts standard CUDA calls and executes them on Apple Silicon (M1/M2/M3/M4) GPUs using Metal Compute.

This project is primarily an educational journey into GPU architecture, driver development, and cross-platform compatibility. It is not intended to replace NVIDIA hardware for high-performance enterprise workloads, but rather to allow students, researchers, and "Mac enjoyers" to run and study CUDA kernels without needing a dedicated green team GPU.

Read the full introduction story on Medium: > Introducing sillyCUDA: My Quest to Learn CUDA as a Mac Enjoyer

🏗 Architecture

sillyCUDA operates by mimicking the CUDA C API and translating logic in real-time. The architecture consists of three main components:

The Runtime (cuda_runtime_api):
- Intercepts standard calls like cudaMalloc and cudaLaunchKernel.
- Manages memory allocation via MTLBuffer instead of NVIDIA drivers.
The Translator:
- Parses CUDA kernels (C++ dialect).
- Generates an Abstract Syntax Tree (AST).
- Transpiles logic into valid Metal Shading Language (MSL).
The Metal Backend:
- Handles the execution pipeline on Apple Silicon.
- Manages command queues and compute states.

🚀 Status & Roadmap

Currently, the project is in the initial development phase.

Phase 1: Mastering GPU Architecture & Basics (Vector Add, Matrix Mult).
Phase 2: Native Metal reimplementation of CUDA concepts.
Phase 3 (Current Goal): The "Vertical Slice" — getting a standard Vector Addition CUDA program to compile and run on macOS without modifying source code.

Scope & Limitations (v1.0 Goals)

Feature	Status
Basic memory management (`malloc`, `memcpy`)	✅ Planned
Kernel launches	✅ Planned
Simple math operations	✅ Planned
Texture memory	❌ Out of scope
Advanced shared memory	❌ Out of scope
Highly optimized cuBLAS libraries	❌ Out of scope

Target Performance: ~50% of native Metal execution speed.

🛠 Building & Installation

Requirements: macOS 13+, Xcode Command Line Tools, CMake.

# Clone the repository
git clone [https://github.com/lwi00/sillyCUDA.git](https://github.com/lwi00/sillyCUDA.git)
cd sillyCUDA

# Create build directory
mkdir build && cd build

# Configure and build
cmake ..
make

🤝 Contributing

This is an open journey! Whether you are a student learning compilers, a graphics engineer, contributions are welcome.

Report Bugs: Open an issue if you find something broken (there will be many).
Discuss: Share ideas on how to improve the translation layer.

📄 License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Made with ❤️ and ☕️ by lwi00.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sillyCUDA

📖 About

🏗 Architecture

🚀 Status & Roadmap

Scope & Limitations (v1.0 Goals)

🛠 Building & Installation

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

sillyCUDA

📖 About

🏗 Architecture

🚀 Status & Roadmap

Scope & Limitations (v1.0 Goals)

🛠 Building & Installation

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages