Universal CPU LLM Acceleration Layer 🚀

A transparent, high-performance C/C++ layer designed to radically accelerate CPU-based Large Language Model inference (Ollama, llama.cpp, vLLM) without any code changes or API modifications.

By functioning as a runtime interceptor natively across Linux, macOS, and Windows, it seamlessly optimizes critical bottlenecks under the hood.

⚡ Features

Zero-Config Integration: Works as a drop-in wrapper. No codebase changes required.
Smart Memory Pooling: Intercepts tensors dynamically, aligning them to 64B cache lines (posix_memalign / VirtualAlloc) to eliminate CPU false sharing.
Core Pinning & NUMA Awareness: Hijacks pthread_create to prevent kernel thread migration, keeping L1/L2 caches hot.
KV-Cache Optimization: Groundwork for detecting and mapping KV cache buffers cleanly.
Cross-Platform: Fully native hooking on Linux (LD_PRELOAD), macOS (DYLD_INTERPOSE), and Windows (Custom IAT Injector).

🛠️ Installation

Prerequisites

CMake (v3.10+)
C Compiler (GCC, Clang, or MSVC)

Build Instructions

1. Clone the repository:

git clone https://github.com/overseek944/CoreFlux.git
cd CoreFlux

2. Compile automatically:

On Windows: Simply double-click the build.bat file in your folder, or run it from the terminal:

build.bat

On Linux / macOS: Run the provided shell script:

chmod +x build.sh
./build.sh

🚀 Usage

You can use CoreFlux transparently with any existing LLM application (e.g., llama.cpp, ollama, vllm). No configurations or flags are needed.

Linux

Pre-load the shared library before launching your desired executable.

LD_PRELOAD=/path/to/CoreFlux/build/libcpu_llm_accel.so ./llama.cpp -m model.gguf

macOS

Inject the dynamic library using Apple's preloading mechanism.

DYLD_INSERT_LIBRARIES=/path/to/CoreFlux/build/libcpu_llm_accel.dylib ./llama.cpp -m model.gguf

Windows

Use the built-in injector wrapper accel_run.exe to launch your target application. Ensure cpu_llm_accel.dll is in the same directory as the injector.

accel_run.exe llama.cpp -m model.gguf

🧠 How it Works

This library operates similarly to jemalloc or OpenBLAS. It wraps standard OS memory and threading APIs. When tools like llama.cpp request heap space or background threads, our layer intercepts these calls, maps the CPU topology, and enforces stringent physical core affinity and memory boundaries tailored explicitly for LLM tensor workloads.

🤝 Contributing

Contributions are heavily encouraged! Please read the CONTRIBUTING.md before submitting pull requests.

📜 License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
include		include
src		src
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.bat		build.bat
build.sh		build.sh
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal CPU LLM Acceleration Layer 🚀

⚡ Features

🛠️ Installation

Prerequisites

Build Instructions

🚀 Usage

Linux

macOS

Windows

🧠 How it Works

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Universal CPU LLM Acceleration Layer 🚀

⚡ Features

🛠️ Installation

Prerequisites

Build Instructions

🚀 Usage

Linux

macOS

Windows

🧠 How it Works

🤝 Contributing

📜 License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages