A curated list of resources for running AI locally on consumer hardware -- LLMs, image generation, and AI agents without cloud dependencies. 230+ guides, tools, and community links.
Running AI locally means privacy, no subscriptions, and full control. This list covers the tools, guides, and communities that make it practical.
Last updated: 2026-03-06
- Getting Started
- Tools
- Hardware Guides
- Inference Engines
- User Interfaces
- Models
- Image Generation
- AI Agents
- Advanced Topics
- Use Cases
- Blog
- Communities
- Contributing
New to local AI? Start here.
- Run Your First Local LLM - Zero to chatting in 10 minutes with Ollama
- Ollama Quickstart - Official getting started guide
- LM Studio Download - Visual interface, no command line needed
- LocalLLaMA Wiki - Community-maintained knowledge base
- What is Quantization? - Understanding Q4, Q5, Q8 and why they matter
- Model Formats Explained - GGUF vs GPTQ vs AWQ vs EXL2
- Building a Local AI Assistant - Private Jarvis with Ollama, Open WebUI, Whisper, and TTS
Interactive tools for planning and optimizing your local AI setup.
- Local AI Planning Tool - Interactive VRAM calculator with hardware, model, and task entry points
Figuring out what hardware you need (or what to do with what you have).
- How Much VRAM Do You Need? - Model size to VRAM mapping
- What Can You Run on 4GB VRAM - GTX 1650, 1050 Ti users
- What Can You Run on 8GB VRAM - RTX 3060 Ti, 4060 class
- The 8GB VRAM Trap - What "runs on 8GB" actually means after quantization
- What Can You Run on 12GB VRAM - RTX 3060 12GB, 4070
- What Can You Run on 16GB VRAM - RTX 4060 Ti 16GB, 4080
- What Can You Run on 24GB VRAM - RTX 3090, 4090
- Running 70B Models Locally - Exact VRAM by quantization for Llama 70B+
- Mixtral 8x7B & 8x22B VRAM Requirements - MoE memory mapping
- Mixtral VRAM Requirements - Every quantization level for 8x7B and 8x22B
- KV Cache and VRAM - Why context length eats your VRAM and how to fix it
- num_ctx VRAM Overflow - The silent performance killer nobody warns about
- GPU Benchmarks for LLM Inference - Community benchmark database
- GPU Buying Guide for Local AI - Price/performance analysis
- Best GPU Under $300 - RTX 3060 12GB, RX 7600, Arc B580 compared
- Best GPU Under $500 - RTX 4060 Ti 16GB, used RTX 3080, RX 7700 XT
- Used RTX 3090 Buying Guide - The value king for 24GB VRAM
- Used GPU Buying Guide - eBay, Marketplace, what to look for
- Best Used GPUs for Local AI 2026 - Tier rankings, fair prices, what to avoid
- Used Server GPUs - Tesla P40, V100, A100 and the eBay goldmine
- Used Tesla P40 - 24GB VRAM for $150-200
- Budget AI PC Under $500 - Used Optiplex + GPU strategy
- Best Mini PCs for Local AI - Under $300 picks with real tok/s
- RTX 5090 for Local AI - Worth the upgrade?
- RTX 5060 Ti for Local AI - Budget next-gen 16GB GPU
- RTX 5060 Ti Benchmarks - Real LLM inference numbers
- RTX 4090 vs Used 3090 - Which to buy for AI workloads
- RTX 3090 vs 4070 Ti Super - Mid-range showdown for local LLMs
- RTX 3060 vs 3060 Ti vs 3070 - 12GB wins despite being the cheapest
- Intel Arc B580 for Local LLMs - 12GB VRAM at $250, with caveats
- Intel Arc GPUs for Local AI - A770 16GB, IPEX-LLM, SYCL setup
- GB10 Boxes Compared - NVIDIA GB10 hardware options
- Multi-GPU Setups: Worth It? - When dual GPUs beat one bigger card
- Multi-GPU Local AI - Run models across multiple GPUs with tensor/pipeline parallelism
- NVIDIA GPU Prices Rising - GDDR7 shortages and what to do
- Razer AI Kit Guide - Razer's dedicated AI hardware
- Build a Distributed AI Swarm Under $1,100 - Three-node cluster bill of materials
- Tom's Hardware GPU Hierarchy - General GPU rankings
- Mac vs PC for Local AI - Unified memory vs discrete GPU
- Running LLMs on Mac M-Series - M1/M2/M3/M4 complete guide
- M4 Max and Ultra for LLMs - Apple Silicon performance update
- Apple M5 Pro and Max - What 4x faster LLM processing means
- Apple Neural Engine for LLMs - What the ANE can and can't do
- Mac Mini M4 for Local AI - Best value Mac setup for AI
- Mac Studio for Local AI - M4 Max vs M3 Ultra pricing analysis
- Mac Studio M4 Every Config - M4 Max 128GB vs M3 Ultra 512GB ranked
- 8GB Apple Silicon Local AI - What actually runs on a budget Mac
- Best Local LLMs for Mac 2026 - Top picks for M-series
- AMD vs NVIDIA for Local AI - ROCm reality check
- ROCm vs CUDA in 2026 - The software gap nobody talks about
- Laptop vs Desktop for Local AI - Portability tradeoffs
- CPU-Only LLMs - Running models without a GPU
- WSL2 for Local AI - Complete Windows setup with GPU passthrough
- WSL2 + Ollama Setup - GPU passthrough, Docker Compose, VPN fixes
- Docker for Local AI - Ollama + Open WebUI with GPU passthrough
- Ubuntu 26.04 for Local AI - CUDA and ROCm in official repos
- Run LLMs on Old Phones - Termux, PocketPal AI, phone vs Pi 5
The software that actually runs the models.
- llama.cpp - The foundational CPU/GPU inference engine, supports GGUF
- Ollama - User-friendly wrapper around llama.cpp with model management
- vLLM - High-throughput serving for production deployments
- ExLlamaV2 - Fastest single-user NVIDIA inference, EXL2 format
- MLX - Apple's framework optimized for M-series Macs
- llama-cpp-python - Python bindings for llama.cpp
- candle - Rust ML framework with LLM support
- llama.cpp vs Ollama vs vLLM - When to use each
- ExLlamaV2 vs llama.cpp Speed - Benchmark comparison of inference backends
- LM Studio vs llama.cpp Speed Gap - Why the GUI runs 30-50% slower
- LM Studio vs Ollama on Mac - MLX is 2x faster than Ollama
- Speculative Decoding Explained - Free 20-50% speed boost using draft models
- MoE Models Explained - Why Mixtral uses 46B params but runs like 13B
- Ollama 0.16-0.17 Changes - 40% faster prompts, KV cache quantization, image gen
- Ollama on Mac Setup - Metal GPU, memory tuning, M1 through M4 Ultra
- Crane Qwen3-TTS Voice Cloning - Local voice cloning with Qwen3-TTS
- Qwen 2.5 VL + LM Studio Vision - Vision model setup in LM Studio
- PaddleOCR VL Local Document OCR - Document OCR running locally
- llama.cpp Hugging Face Acquisition - What the ggml.ai acquisition means
GUIs and web interfaces for interacting with local models.
- LM Studio - Polished desktop app with built-in model browser
- GPT4All - Simple desktop app, good for beginners
- Jan - Open-source ChatGPT alternative with local models
- Msty - Mac-native LLM interface
- Open WebUI - ChatGPT-style interface, works with Ollama
- text-generation-webui - The Swiss Army knife of local AI UIs
- SillyTavern - Frontend for chat/roleplay with local models
- LibreChat - Multi-provider chat interface
- AnythingLLM - RAG-focused interface with document upload
- Ollama vs LM Studio - Comparison of the two most popular tools
- Open WebUI Setup Guide - Installation and configuration
- Text Generation WebUI Guide - Power user setup
- LM Studio Tips & Tricks - Hidden features
- AnythingLLM Setup Guide - Chat with your documents locally
- Managing Multiple Models in Ollama - Storage, switching, cleanup
- How to Update Models in Ollama - Keep local LLMs current
- Running AI Offline - Air-gapped setups for field work
- Obsidian + Local LLM - Private AI-powered note search and summaries
- Obsidian + Local LLM Second Brain - Smart Connections, Open WebUI RAG, AnythingLLM
- Ollama Library - Curated models ready to run with
ollama pull - HuggingFace Open LLM Leaderboard - Benchmark comparisons
- TheBloke on HuggingFace - Quantized models in every format
- bartowski on HuggingFace - High-quality GGUF quantizations
- Llama 3 - Meta's flagship open model (1B to 405B)
- Qwen 2.5 - Alibaba's strong multilingual models
- Mistral - Efficient 7B-12B models
- DeepSeek - Strong reasoning and coding models
- Phi - Microsoft's small-but-capable models
- Gemma - Google's open models
- Llama 3 Guide - Every size from 1B to 405B
- Llama 4 Guide - Scout and Maverick
- Qwen Models Guide - Qwen 3, Qwen 2.5 Coder, Qwen-VL
- Qwen3 Complete Guide - All Qwen3 models compared
- Qwen 3.5 Local Guide - Latest Qwen 3.5 release
- Qwen 3.5 for Local AI - Which model, which quant, which GPU
- Qwen 3.5 Locally -- 27B vs 35B-A3B vs 122B - VRAM tables and tok/s benchmarks
- Qwen 3.5 on Mac: MLX vs Ollama - MLX is 2x faster, benchmarks and setup
- Qwen 3.5 9B Setup Guide - The new default for 8GB GPUs
- Qwen 3.5 Small Models - The 9B beats last-gen 30B
- Qwen vs Llama vs Mistral Shootout - Which family to build on
- DeepSeek Models Guide - R1, V3, Coder
- DeepSeek V3.2 Guide - What changed and how to run it
- DeepSeek V4 Preview - Everything we know before it drops
- GPT-OSS Guide - OpenAI's first open model
- Mistral & Mixtral Guide - 7B, Nemo, Mixtral MoE
- Gemma Models Guide - Gemma 3, Gemma 2, CodeGemma, PaliGemma
- Phi Models Guide - Phi-4, Phi-3.5, Phi-3
- LiquidAI LFM2 Guide - First hybrid model built for local hardware, 112 tok/s on CPU
- RWKV-7 Local AI Guide - Infinite context, zero KV cache
- RWKV-7 Local Guide - RNN that trains like a transformer, runs on anything
- Llama 4 vs Qwen3 vs DeepSeek V3.2 - Head-to-head comparison for local use
- Distilled vs Frontier Models - What you're actually getting
- LLM Benchmarks Lie - Why scores don't predict real-world performance
- Vision Models Locally - Qwen2.5-VL, Gemma 3, Llama 3.2 Vision, Moondream
- Best Uncensored Local LLMs - Dolphin, abliterated models, uncensored fine-tunes
- Best Models Under 3B - For edge devices
- Best Models for Coding - Code completion and generation
- Best Models for Math & Reasoning - DeepSeek R1, Qwen thinking
- Best Models for Writing - Creative and content work
- Best Models for Chat - Conversational assistants
- Best Models for Summarization - Chunking strategies, model picks
- Best Models for Translation - NLLB, Qwen, Opus-MT by language pair
- Best Models for Data Analysis - CSV, SQL, pandas with local LLMs
- Best Models for RAG - Best local models for retrieval-augmented generation
- Function Calling with Local LLMs - Tools, agents, and structured output
- Stable Diffusion - The original open image model
- SDXL - Higher resolution, better quality
- Flux - Best open image model for prompt following
- SD 3.5 - Stability's latest
- Civitai - Community checkpoints, LoRAs, and embeddings
Interfaces and tools for running image generation locally.
- ComfyUI - Node-based workflow, supports everything
- AUTOMATIC1111 - Classic web UI, huge extension ecosystem
- Forge - A1111 fork with better performance
- Fooocus - Simplified UI, Midjourney-like experience
- SD.Next - A1111 fork with AMD/Intel support
- InvokeAI - Professional-grade creative tool
- Stable Diffusion Locally - Complete getting started guide
- Flux Locally - Running Flux on consumer hardware
- ComfyUI vs A1111 vs Fooocus - Which UI to choose
- SDXL vs SD 1.5 vs Flux - VRAM, speed, and quality compared
- AI Art Styles & Workflows Guide - Creative techniques for SD and Flux
- ControlNet Guide for Beginners - Canny, OpenPose, Depth preprocessors
- AI Upscaling Locally - Real-ESRGAN, SUPIR, and ComfyUI workflows
- Best Photorealism Checkpoints - Juggernaut XL, RealVisXL, Realistic Vision, Flux
- Best Anime & Stylized Checkpoints - Illustrious XL, NoobAI-XL, Animagine, Pony
- Stable Diffusion on Mac - Draw Things, ComfyUI, MLX
- Local AI Video Generation - What actually works on consumer hardware
- ControlNet - Precise image control
- IP-Adapter - Style and subject transfer
- AnimateDiff - Video generation from SD
- Upscayl - AI image upscaling
Running autonomous AI agents locally.
- OpenClaw - Open-source AI agent framework
- AutoGPT - Autonomous GPT-4 agent
- CrewAI - Multi-agent orchestration
- LangChain - LLM application framework
- LlamaIndex - Data framework for LLM apps
- Haystack - NLP framework for agents
- LocalAgent - Local-first agent runtime with safe tool calling
- SmarterRouter - VRAM-aware LLM gateway for local AI
- OpenClaw Setup Guide - Local AI agent installation
- How OpenClaw Actually Works - Architecture guide: messages, heartbeats, crons, hooks, webhooks
- OpenClaw Security Guide - Hardening autonomous agents
- OpenClaw Security Feb 2026 - SSRF bypass, sandbox escapes, every fix
- Best Models for OpenClaw - Which Ollama models work for agents
- OpenClaw Model Combinations - What to pair for each task
- OpenClaw vs Commercial Agents - Local vs Lindy, Rabbit, etc.
- OpenClaw vs Cursor - Local AI agent or cloud IDE?
- OpenClaw on Mac - Apple Silicon setup and optimization
- OpenClaw on Raspberry Pi - What works and what doesn't on Pi 5
- OpenClaw on Low VRAM GPUs - 4GB, 6GB, and 8GB GPU guide
- OpenClaw Tool Call Failures - Why models break and how to fix them
- OpenClaw ClawHub Security Alert - 341 malicious skills found in marketplace
- ClawHub Malware Alert - The #1 skill was malware, how to protect yourself
- OpenClaw Plugins & Skills Guide - Navigating the skills ecosystem safely
- Best OpenClaw Tools & Extensions - Crabwalk, Tokscale, openclaw-docker
- OpenClaw Token Optimization - Reduce API costs for agent workflows
- OpenClaw Hardware Guide - Mac Mini, VPS, or PC for agents
- OpenClaw Local Zero API Costs - Run OpenClaw fully local
- OpenClaw Memory & Context Rot - Fix agent memory issues
- OpenClaw Model Routing - Route requests to different models
- OpenClaw After Steinberger - What the OpenAI move means for your setup
- OpenClaw Creator Joins OpenAI - What it means for local AI agents
- Best OpenClaw Alternatives - NanoClaw, Nanobot, ZeroClaw, LightClaw
- Every OpenClaw Alternative 2026 - Comprehensive comparison
- LightClaw - 7,000-line Python alternative to OpenClaw
- Building AI Agents with Local LLMs - Model requirements, VRAM budgets, framework comparison
- What Agents Can't Do Yet - Seven human capabilities missing from AI systems
- Agent Trust Decay - Why long-running agents get worse over time
- Intent Engineering for Agents - Why agents need more than context
- Intent Engineering Practical Guide - Goals, decision boundaries, value hierarchies
- The Agentic Web - What a parallel web for AI agents means for local builders
Going deeper into local AI.
- Beyond Transformers: 5 Architectures - What comes after Transformers
- Context Length Explained - How context windows work
- Hallucination Feedback Loop - When AI errors compound
- Session as RAG - Using conversation as retrieval
- AI Memory Wall - Why chatbots forget
- Distributed Wisdom Thinking Network - Multi-node reasoning
- Ouro 2B Thinking Model - Looped reasoning on small hardware
- Local RAG Guide - Search your documents with private AI
- Embedding Models for RAG - nomic-embed-text, Qwen3-Embedding, bge-m3 compared
- Ghost Knowledge - When your RAG system cites documents that no longer exist
- Chroma - Open-source embedding database
- Qdrant - Vector similarity search engine
- FAISS - Facebook's similarity search library
- Fine-Tuning on Consumer Hardware - LoRA and QLoRA guide
- Fine-Tuning on Mac - LoRA and QLoRA with MLX on Apple Silicon
- LoRA Training on Consumer Hardware - Fine-tune locally on your GPU
- NanoLlama Train From Scratch - Train your own Llama from zero
- Axolotl - Streamlined fine-tuning tool
- Unsloth - 2x faster fine-tuning
- PEFT - HuggingFace parameter-efficient fine-tuning
- Voice Chat with Local LLMs - Whisper + TTS setup
- Whisper - OpenAI's speech recognition
- whisper.cpp - Fast Whisper inference
- Coqui TTS - Text-to-speech synthesis
- Piper - Fast local TTS
- mycoSwarm vs Exo vs Petals - Distributed inference frameworks compared
- mycoSwarm WiFi Laptop Setup - Distributed AI on borrowed GPUs
- Why mycoSwarm Was Born - From Claude Code envy to building a distributed AI runtime
- Continue - VS Code/JetBrains AI assistant
- Tabby - Self-hosted code completion
- Aider - AI pair programming in terminal
- Codeium - Free AI code completion (cloud + local options)
- Replace GitHub Copilot with Local LLMs - Free, private AI code completion in VS Code
- Claude Code vs PI Agent - Cloud coding agent vs local alternative
- PI Agent + Ollama - Run a coding agent on local models
- Local Alternatives to Claude Code - Code agents without cloud
- 5 Levels of AI Coding - From autocomplete to dark factories
- CodeLlama vs DeepSeek vs Qwen Coder - Coding model comparison
- Token Audit Guide - Track what AI actually costs
- Tiered AI Model Strategy - When to use local vs cloud
- Model Routing for Local AI - Stop using one model for everything
- AI Tool Sprawl - Consolidation guide for too many AI tools
- Local LLMs vs ChatGPT - Honest comparison
- Local LLMs vs Claude - Honest comparison
- Pi AI vs Local AI - Cloud companion or private assistant?
- Cost to Run LLMs Locally - Real electricity and hardware costs
- Free Local AI vs Paid Cloud APIs - Break-even math with 2026 API pricing
- The Complexity Cliff - Why the jump from hello world to useful is so hard
- Prompt Debt - When your system prompt becomes unmaintainable
- AI Market Panic Explained - Running local puts you on the right side of the gap
- Local AI Privacy Guide - What's private, what leaks, and how to lock it down
- Structured Output from Local LLMs - Force valid JSON/YAML with grammar constraints
- Local AI Troubleshooting Guide - Fix common problems
- Ollama Troubleshooting Guide - Ollama-specific fixes
- Ollama Mac Troubleshooting - Metal, memory pressure, slow performance
- CUDA Out of Memory Fix - GPU memory errors solved
- Ollama Not Using GPU Fix - Force GPU usage in Ollama
- Ollama API Connection Refused - Port, Docker, and network fixes
- Open WebUI Not Connecting to Ollama - Docker networking and WSL2 fixes
- ROCm Not Detecting GPU Fix - AMD GPU detection issues
- Why Is My Local LLM Slow? - Speed diagnosis guide
- LLM Running Slow? Two Different Fixes - Separate prefill from generation speed
- Context Length Exceeded Fix - Fix token limit errors
- Memory Leak in Long Conversations - VRAM leak fix for long sessions
- GGUF File Won't Load - Format and compatibility fixes
- Model Outputs Garbage - Debug bad generations
- llama.cpp Build Errors - Common fixes for every platform
- Qwen2.5-VL LM Studio Troubleshooting - Fix mmproj and vision errors
Practical applications and scenario-specific guides.
- Local AI Use Cases Cloud Can't Touch - Privacy-first use cases only local can do
- Local AI for Lawyers - Confidential document analysis without cloud risk
- Local AI for Therapists - Session notes and treatment plans without cloud
- Local AI for Accountants - Tax prep and financial analysis locally
- Local AI for Accounting Privacy - Keep financial data off the cloud
- Local AI for Small Business - Replace $1,500/yr in AI subscriptions with a $600 mini PC
- Rescued Hardware, Rescued Bees - Building tech from what others throw away
- What Open Source Was Supposed to Be - Open source promised freedom, we got free labor
- Developmental Alignment - What if we just raised it well?
- Teaching AI About Death - A local AI describes her own death as Taoist philosophy
- Teaching AI What Love Means - What happens when you give a local AI an identity
Development logs, news reactions, and opinion pieces.
- GPT-5.4 Dropped. Here's Why I'm Not Switching - 1M context, beats humans on OSWorld, still costs money
- Qwen's Architect Just Walked Out the Door - Junyang Lin leaves Alibaba
- Wu Wei and the AI Agent That Did Too Much - Taoist non-action and agentic AI design
- Teaching AI to Accept Help: Day 4 With Monica - Local AI resists corrections, then self-diagnoses
- mycoSwarm Week 1: Four-Node Swarm - From idea to working distributed cluster
- mycoSwarm Week 2: Raspberry Pi Joins - Persistent memory, document RAG, WiFi GPU
- mycoSwarm Week 3: Unified Memory Search - Session-as-RAG, topic splitting, citation tracking
Where to get help and stay updated.
- r/LocalLLaMA - The main hub for local AI discussion
- r/StableDiffusion - Image generation community
- r/Ollama - Ollama-specific discussion
- r/Oobabooga - text-generation-webui community
- LocalLLaMA Discord - Active chat community
- Ollama Discord - Official Ollama server
- ComfyUI Discord - ComfyUI community
- Stable Diffusion Discord - Image generation community
- HuggingFace Discussions - Model-specific discussions
- llama.cpp GitHub Discussions - Technical questions
- LocalAI Slack - LocalAI community
Contributions welcome! Please read the contribution guidelines first.
- Add resources that are genuinely useful for running AI locally
- Include a brief description explaining why the resource is valuable
- Verify links are working and resources are actively maintained
- Keep the list organized and avoid duplicates
To the extent possible under law, the contributors have waived all copyright and related rights to this work.