Optimized for AMD RX 6800XT / Ryzen 7900X with 32GB RAM.
To start the LLM server and the Pi agent:
nix-develop .#agenticThis starts a llama-server on http://127.0.0.1:8080 and launches the pi.dev TUI pointing to it.
Note: The agentic shell uses a
shellHookthat exports theshellHookenv var. If you runnix-shell -p <pkg>from within the agentic shell, the child inherits this variable and re-runs the hook (spawning a second llama server). Usenix-shell --pure -p <pkg>orunset shellHook && nix-shell -p <pkg>to avoid this.
nix-develop .#uiThis starts a llama.cpp UI on http://0.0.0.0:8080 or http://<local_ipv4>:8080
Additionally it runs
Download models into organized subdirectories within ./models/. This structure allows llama-server to automatically discover models when using --models-dir ./models --models-preset models.ini.
Active parameters: ~4B. High speed, efficient reasoning.
nix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/gemma-4-26B-A4B-it-GGUF \
gemma-4-26B-A4B-it-Q8_0.gguf \
--local-dir ./models/gemma-4-26bnix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/gemma-4-26B-A4B-it-GGUF \
mmproj-BF16.gguf \
--local-dir ./models/gemma-4-26b/multimodalWe choose Q6_K_XL quant because it's the best quant according to unsloth's benchmarks. We can do Q8_0 if we wanted but it'll take up more space
nix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/Qwen3.6-35B-A3B-GGUF \
Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
--local-dir ./models/qwen3.6-35bnix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/Qwen3.6-35B-A3B-GGUF \
mmproj-BF16.gguf \
--local-dir ./models/qwen3.6-35bActive parameters: ~3B. Extremely fast Mixture of Experts model. Hugging Face Link
nix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/Qwen3.5-35B-A3B-GGUF \
Qwen3.5-35B-A3B-Q5_K_M.gguf \
--local-dir ./models/qwen3.5-35bFull parameter computation for consistent depth and reasoning. Hugging Face Link
nix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/Qwen3.6-27B-GGUF \
Qwen3.6-27B-Q4_K_S.gguf \
--local-dir ./models/qwen3.6-27bFull parameter computation for consistent depth and reasoning. Hugging Face Link
nix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/Qwen3.5-27B-GGUF \
Qwen3.5-27B-Q4_K_M.gguf \
--local-dir ./models/qwen3.5-27bTo maximize performance on AMD RDNA2 hardware, these configurations are applied via llama-common.sh:
| Variable | Purpose | Benefit |
|---|---|---|
HIP_VISIBLE_DEVICES=0 |
Selects discrete GPU only (ignores iGPU) to ensure full VRAM availability for model weights. | Prevents resource conflicts and ensures max memory usage. |
GPU_ENABLE_WGP_MODE=0 |
Forces scheduling at individual Compute Unit level rather than Workgroup Processors. | Improved math utilization and better layer distribution on RDNA2. |
| Variable | Purpose | Benefit |
|---|---|---|
AMD_VULKAN_ICD=RADV |
Uses RADV Vulkan ICD instead of AMD's proprietary driver. | Better compatibility/performance with llama.cpp. |
Install Nix via Determinate
TODO: What does the E mean
nix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/gemma-4-E2B-it-GGUF \
gemma-4-E2B-it-Q4_K_M.gguf \
--local-dir ./models/gemma-4-e2bnix run nixpkgs#python313Packages.huggingface-hub -- download \
unsloth/gemma-4-E2B-it-GGUF \
mmproj-BF16.gguf \
--local-dir ./models/gemma-4-e2b/multimodalnix develop