Skip to content

kushagharahi/nix-llm

Repository files navigation

Local LLM with Nix Flakes

Optimized for AMD RX 6800XT / Ryzen 7900X with 32GB RAM.

image

🚀 Quick Start - Linux

Agentic workflows

To start the LLM server and the Pi agent:

nix-develop .#agentic

This starts a llama-server on http://127.0.0.1:8080 and launches the pi.dev TUI pointing to it.

Note: The agentic shell uses a shellHook that exports the shellHook env var. If you run nix-shell -p <pkg> from within the agentic shell, the child inherits this variable and re-runs the hook (spawning a second llama server). Use nix-shell --pure -p <pkg> or unset shellHook && nix-shell -p <pkg> to avoid this.

llama.cpp chat interface

nix-develop .#ui

This starts a llama.cpp UI on http://0.0.0.0:8080 or http://<local_ipv4>:8080

Additionally it runs


Model Installation

Download models into organized subdirectories within ./models/. This structure allows llama-server to automatically discover models when using --models-dir ./models --models-preset models.ini.

Gemma 4 26B-4B (MoE)

Active parameters: ~4B. High speed, efficient reasoning.

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-26B-A4B-it-GGUF \
  gemma-4-26B-A4B-it-Q8_0.gguf \
  --local-dir ./models/gemma-4-26b

multimedia projector aka image gen additional download

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-26B-A4B-it-GGUF \
  mmproj-BF16.gguf \
  --local-dir ./models/gemma-4-26b/multimodal

Qwen 3.6 35B-A3B (MoE)

We choose Q6_K_XL quant because it's the best quant according to unsloth's benchmarks. We can do Q8_0 if we wanted but it'll take up more space

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.6-35B-A3B-GGUF \
   Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
  --local-dir ./models/qwen3.6-35b

multimedia projector aka image gen additional download

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.6-35B-A3B-GGUF \
  mmproj-BF16.gguf \
  --local-dir ./models/qwen3.6-35b

Qwen 3.5 35B-A3B (MoE)

Active parameters: ~3B. Extremely fast Mixture of Experts model. Hugging Face Link

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.5-35B-A3B-GGUF \
  Qwen3.5-35B-A3B-Q5_K_M.gguf \
  --local-dir ./models/qwen3.5-35b

Qwen 3.6 27B (Dense)

Full parameter computation for consistent depth and reasoning. Hugging Face Link

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.6-27B-GGUF \
  Qwen3.6-27B-Q4_K_S.gguf \
  --local-dir ./models/qwen3.6-27b

Qwen 3.5 27B (Dense)

Full parameter computation for consistent depth and reasoning. Hugging Face Link

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.5-27B-GGUF \
  Qwen3.5-27B-Q4_K_M.gguf \
  --local-dir ./models/qwen3.5-27b

Hardware Optimizations (AMD GPU)

To maximize performance on AMD RDNA2 hardware, these configurations are applied via llama-common.sh:

Environment Variables

ROCM

Variable Purpose Benefit
HIP_VISIBLE_DEVICES=0 Selects discrete GPU only (ignores iGPU) to ensure full VRAM availability for model weights. Prevents resource conflicts and ensures max memory usage.
GPU_ENABLE_WGP_MODE=0 Forces scheduling at individual Compute Unit level rather than Workgroup Processors. Improved math utilization and better layer distribution on RDNA2.

Vulkan

Variable Purpose Benefit
AMD_VULKAN_ICD=RADV Uses RADV Vulkan ICD instead of AMD's proprietary driver. Better compatibility/performance with llama.cpp.

M1 Mac 8gb

Install Nix via Determinate

Gemma 4 E2B

TODO: What does the E mean

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-E2B-it-GGUF \
  gemma-4-E2B-it-Q4_K_M.gguf \
  --local-dir ./models/gemma-4-e2b

multimedia projector aka image gen additional download

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-E2B-it-GGUF \
  mmproj-BF16.gguf \
  --local-dir ./models/gemma-4-e2b/multimodal

Run for a llama-ui with Gemma E2B with image/audio support

nix develop

About

nix flake for local LLM agentic/chat configured to my desktop specs (7900xt/rx6800xt/32gb) (llama.cpp/pi.dev)

Resources

Stars

Watchers

Forks

Contributors