Local LLM with Nix Flakes

Optimized for AMD RX 6800XT / Ryzen 7900X with 32GB RAM.

🚀 Quick Start - Linux

Agentic workflows

To start the LLM server and the Pi agent:

nix-develop .#agentic

This starts a llama-server on http://127.0.0.1:8080 and launches the pi.dev TUI pointing to it.

Note: The agentic shell uses a shellHook that exports the shellHook env var. If you run nix-shell -p <pkg> from within the agentic shell, the child inherits this variable and re-runs the hook (spawning a second llama server). Use nix-shell --pure -p <pkg> or unset shellHook && nix-shell -p <pkg> to avoid this.

llama.cpp chat interface

nix-develop .#ui

This starts a llama.cpp UI on http://0.0.0.0:8080 or http://<local_ipv4>:8080

Additionally it runs

Model Installation

Download models into organized subdirectories within ./models/. This structure allows llama-server to automatically discover models when using --models-dir ./models --models-preset models.ini.

Gemma 4 26B-4B (MoE)

Active parameters: ~4B. High speed, efficient reasoning.

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-26B-A4B-it-GGUF \
  gemma-4-26B-A4B-it-Q8_0.gguf \
  --local-dir ./models/gemma-4-26b

multimedia projector aka image gen additional download

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-26B-A4B-it-GGUF \
  mmproj-BF16.gguf \
  --local-dir ./models/gemma-4-26b/multimodal

Qwen 3.6 35B-A3B (MoE)

We choose Q6_K_XL quant because it's the best quant according to unsloth's benchmarks. We can do Q8_0 if we wanted but it'll take up more space

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.6-35B-A3B-GGUF \
   Qwen3.6-35B-A3B-UD-Q6_K_XL.gguf \
  --local-dir ./models/qwen3.6-35b

multimedia projector aka image gen additional download

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.6-35B-A3B-GGUF \
  mmproj-BF16.gguf \
  --local-dir ./models/qwen3.6-35b

Qwen 3.5 35B-A3B (MoE)

Active parameters: ~3B. Extremely fast Mixture of Experts model. Hugging Face Link

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.5-35B-A3B-GGUF \
  Qwen3.5-35B-A3B-Q5_K_M.gguf \
  --local-dir ./models/qwen3.5-35b

Qwen 3.6 27B (Dense)

Full parameter computation for consistent depth and reasoning. Hugging Face Link

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.6-27B-GGUF \
  Qwen3.6-27B-Q4_K_S.gguf \
  --local-dir ./models/qwen3.6-27b

Qwen 3.5 27B (Dense)

Full parameter computation for consistent depth and reasoning. Hugging Face Link

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/Qwen3.5-27B-GGUF \
  Qwen3.5-27B-Q4_K_M.gguf \
  --local-dir ./models/qwen3.5-27b

Hardware Optimizations (AMD GPU)

To maximize performance on AMD RDNA2 hardware, these configurations are applied via llama-common.sh:

Environment Variables

ROCM

Variable	Purpose	Benefit
`HIP_VISIBLE_DEVICES=0`	Selects discrete GPU only (ignores iGPU) to ensure full VRAM availability for model weights.	Prevents resource conflicts and ensures max memory usage.
`GPU_ENABLE_WGP_MODE=0`	Forces scheduling at individual Compute Unit level rather than Workgroup Processors.	Improved math utilization and better layer distribution on RDNA2.

Vulkan

Variable	Purpose	Benefit
`AMD_VULKAN_ICD=RADV`	Uses RADV Vulkan ICD instead of AMD's proprietary driver.	Better compatibility/performance with `llama.cpp`.

M1 Mac 8gb

Install Nix via Determinate

Gemma 4 E2B

TODO: What does the E mean

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-E2B-it-GGUF \
  gemma-4-E2B-it-Q4_K_M.gguf \
  --local-dir ./models/gemma-4-e2b

multimedia projector aka image gen additional download

nix run nixpkgs#python313Packages.huggingface-hub -- download \
  unsloth/gemma-4-E2B-it-GGUF \
  mmproj-BF16.gguf \
  --local-dir ./models/gemma-4-e2b/multimodal

Run for a llama-ui with Gemma E2B with image/audio support

nix develop

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
coding-agent		coding-agent
prompts		prompts
.gitignore		.gitignore
bench.sh		bench.sh
benchmarks.md		benchmarks.md
env-setup.sh		env-setup.sh
flake.lock		flake.lock
flake.nix		flake.nix
image.png		image.png
mcp.sh		mcp.sh
models.ini		models.ini
readme.md		readme.md
run-agentic.sh		run-agentic.sh
run-llama-ui.sh		run-llama-ui.sh
uiConfig.json		uiConfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local LLM with Nix Flakes

🚀 Quick Start - Linux

Agentic workflows

llama.cpp chat interface

Model Installation

Gemma 4 26B-4B (MoE)

multimedia projector aka image gen additional download

Qwen 3.6 35B-A3B (MoE)

We choose Q6_K_XL quant because it's the best quant according to unsloth's benchmarks. We can do Q8_0 if we wanted but it'll take up more space

multimedia projector aka image gen additional download

Qwen 3.5 35B-A3B (MoE)

Qwen 3.6 27B (Dense)

Qwen 3.5 27B (Dense)

Hardware Optimizations (AMD GPU)

Environment Variables

ROCM

Vulkan

M1 Mac 8gb

Gemma 4 E2B

multimedia projector aka image gen additional download

Run for a llama-ui with Gemma E2B with image/audio support

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local LLM with Nix Flakes

🚀 Quick Start - Linux

Agentic workflows

llama.cpp chat interface

Model Installation

Gemma 4 26B-4B (MoE)

multimedia projector aka image gen additional download

Qwen 3.6 35B-A3B (MoE)

We choose Q6_K_XL quant because it's the best quant according to unsloth's benchmarks. We can do Q8_0 if we wanted but it'll take up more space

multimedia projector aka image gen additional download

Qwen 3.5 35B-A3B (MoE)

Qwen 3.6 27B (Dense)

Qwen 3.5 27B (Dense)

Hardware Optimizations (AMD GPU)

Environment Variables

ROCM

Vulkan

M1 Mac 8gb

Gemma 4 E2B

multimedia projector aka image gen additional download

Run for a llama-ui with Gemma E2B with image/audio support

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages