Skip to content

lwi00/sillyCUDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

sillyCUDA

A compatibility runtime to execute CUDA code on Apple Silicon chips by translating calls to Metal.

License Platform

📖 About

sillyCUDA is an open-source research project designed to bridge the gap between NVIDIA's proprietary CUDA ecosystem and Apple's Metal API. The goal is to build a translation layer that intercepts standard CUDA calls and executes them on Apple Silicon (M1/M2/M3/M4) GPUs using Metal Compute.

This project is primarily an educational journey into GPU architecture, driver development, and cross-platform compatibility. It is not intended to replace NVIDIA hardware for high-performance enterprise workloads, but rather to allow students, researchers, and "Mac enjoyers" to run and study CUDA kernels without needing a dedicated green team GPU.

Read the full introduction story on Medium: > Introducing sillyCUDA: My Quest to Learn CUDA as a Mac Enjoyer

🏗 Architecture

sillyCUDA operates by mimicking the CUDA C API and translating logic in real-time. The architecture consists of three main components:

  1. The Runtime (cuda_runtime_api):
    • Intercepts standard calls like cudaMalloc and cudaLaunchKernel.
    • Manages memory allocation via MTLBuffer instead of NVIDIA drivers.
  2. The Translator:
    • Parses CUDA kernels (C++ dialect).
    • Generates an Abstract Syntax Tree (AST).
    • Transpiles logic into valid Metal Shading Language (MSL).
  3. The Metal Backend:
    • Handles the execution pipeline on Apple Silicon.
    • Manages command queues and compute states.

🚀 Status & Roadmap

Currently, the project is in the initial development phase.

  • Phase 1: Mastering GPU Architecture & Basics (Vector Add, Matrix Mult).
  • Phase 2: Native Metal reimplementation of CUDA concepts.
  • Phase 3 (Current Goal): The "Vertical Slice" — getting a standard Vector Addition CUDA program to compile and run on macOS without modifying source code.

Scope & Limitations (v1.0 Goals)

Feature Status
Basic memory management (malloc, memcpy) ✅ Planned
Kernel launches ✅ Planned
Simple math operations ✅ Planned
Texture memory ❌ Out of scope
Advanced shared memory ❌ Out of scope
Highly optimized cuBLAS libraries ❌ Out of scope

Target Performance: ~50% of native Metal execution speed.

🛠 Building & Installation

Requirements: macOS 13+, Xcode Command Line Tools, CMake.

# Clone the repository
git clone [https://github.com/lwi00/sillyCUDA.git](https://github.com/lwi00/sillyCUDA.git)
cd sillyCUDA

# Create build directory
mkdir build && cd build

# Configure and build
cmake ..
make

🤝 Contributing

This is an open journey! Whether you are a student learning compilers, a graphics engineer, contributions are welcome.

  • Report Bugs: Open an issue if you find something broken (there will be many).

  • Discuss: Share ideas on how to improve the translation layer.

📄 License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Made with ❤️ and ☕️ by lwi00.