Skip to content

overseek944/CoreFlux

Repository files navigation

Universal CPU LLM Acceleration Layer 🚀

Build Status License: MIT

A transparent, high-performance C/C++ layer designed to radically accelerate CPU-based Large Language Model inference (Ollama, llama.cpp, vLLM) without any code changes or API modifications.

By functioning as a runtime interceptor natively across Linux, macOS, and Windows, it seamlessly optimizes critical bottlenecks under the hood.

⚡ Features

  • Zero-Config Integration: Works as a drop-in wrapper. No codebase changes required.
  • Smart Memory Pooling: Intercepts tensors dynamically, aligning them to 64B cache lines (posix_memalign / VirtualAlloc) to eliminate CPU false sharing.
  • Core Pinning & NUMA Awareness: Hijacks pthread_create to prevent kernel thread migration, keeping L1/L2 caches hot.
  • KV-Cache Optimization: Groundwork for detecting and mapping KV cache buffers cleanly.
  • Cross-Platform: Fully native hooking on Linux (LD_PRELOAD), macOS (DYLD_INTERPOSE), and Windows (Custom IAT Injector).

🛠️ Installation

Prerequisites

  • CMake (v3.10+)
  • C Compiler (GCC, Clang, or MSVC)

Build Instructions

1. Clone the repository:

git clone https://github.com/overseek944/CoreFlux.git
cd CoreFlux

2. Compile automatically:

On Windows: Simply double-click the build.bat file in your folder, or run it from the terminal:

build.bat

On Linux / macOS: Run the provided shell script:

chmod +x build.sh
./build.sh

🚀 Usage

You can use CoreFlux transparently with any existing LLM application (e.g., llama.cpp, ollama, vllm). No configurations or flags are needed.

Linux

Pre-load the shared library before launching your desired executable.

LD_PRELOAD=/path/to/CoreFlux/build/libcpu_llm_accel.so ./llama.cpp -m model.gguf

macOS

Inject the dynamic library using Apple's preloading mechanism.

DYLD_INSERT_LIBRARIES=/path/to/CoreFlux/build/libcpu_llm_accel.dylib ./llama.cpp -m model.gguf

Windows

Use the built-in injector wrapper accel_run.exe to launch your target application. Ensure cpu_llm_accel.dll is in the same directory as the injector.

accel_run.exe llama.cpp -m model.gguf

🧠 How it Works

This library operates similarly to jemalloc or OpenBLAS. It wraps standard OS memory and threading APIs. When tools like llama.cpp request heap space or background threads, our layer intercepts these calls, maps the CPU topology, and enforces stringent physical core affinity and memory boundaries tailored explicitly for LLM tensor workloads.

🤝 Contributing

Contributions are heavily encouraged! Please read the CONTRIBUTING.md before submitting pull requests.

📜 License

Distributed under the MIT License. See LICENSE for more information.

About

Universal CPU LLM Acceleration Layer

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors