Awesome Local AI

A curated list of resources for running AI locally on consumer hardware -- LLMs, image generation, and AI agents without cloud dependencies. 230+ guides, tools, and community links.

Running AI locally means privacy, no subscriptions, and full control. This list covers the tools, guides, and communities that make it practical.

Last updated: 2026-03-06

Getting Started

New to local AI? Start here.

Run Your First Local LLM - Zero to chatting in 10 minutes with Ollama
Ollama Quickstart - Official getting started guide
LM Studio Download - Visual interface, no command line needed
LocalLLaMA Wiki - Community-maintained knowledge base
What is Quantization? - Understanding Q4, Q5, Q8 and why they matter
Model Formats Explained - GGUF vs GPTQ vs AWQ vs EXL2
Building a Local AI Assistant - Private Jarvis with Ollama, Open WebUI, Whisper, and TTS

Tools

Interactive tools for planning and optimizing your local AI setup.

Local AI Planning Tool - Interactive VRAM calculator with hardware, model, and task entry points

Hardware Guides

Figuring out what hardware you need (or what to do with what you have).

VRAM Requirements

How Much VRAM Do You Need? - Model size to VRAM mapping
What Can You Run on 4GB VRAM - GTX 1650, 1050 Ti users
What Can You Run on 8GB VRAM - RTX 3060 Ti, 4060 class
The 8GB VRAM Trap - What "runs on 8GB" actually means after quantization
What Can You Run on 12GB VRAM - RTX 3060 12GB, 4070
What Can You Run on 16GB VRAM - RTX 4060 Ti 16GB, 4080
What Can You Run on 24GB VRAM - RTX 3090, 4090
Running 70B Models Locally - Exact VRAM by quantization for Llama 70B+
Mixtral 8x7B & 8x22B VRAM Requirements - MoE memory mapping
Mixtral VRAM Requirements - Every quantization level for 8x7B and 8x22B
KV Cache and VRAM - Why context length eats your VRAM and how to fix it
num_ctx VRAM Overflow - The silent performance killer nobody warns about
GPU Benchmarks for LLM Inference - Community benchmark database

Buying Guides

GPU Buying Guide for Local AI - Price/performance analysis
Best GPU Under $300 - RTX 3060 12GB, RX 7600, Arc B580 compared
Best GPU Under $500 - RTX 4060 Ti 16GB, used RTX 3080, RX 7700 XT
Used RTX 3090 Buying Guide - The value king for 24GB VRAM
Used GPU Buying Guide - eBay, Marketplace, what to look for
Best Used GPUs for Local AI 2026 - Tier rankings, fair prices, what to avoid
Used Server GPUs - Tesla P40, V100, A100 and the eBay goldmine
Used Tesla P40 - 24GB VRAM for $150-200
Budget AI PC Under $500 - Used Optiplex + GPU strategy
Best Mini PCs for Local AI - Under $300 picks with real tok/s
RTX 5090 for Local AI - Worth the upgrade?
RTX 5060 Ti for Local AI - Budget next-gen 16GB GPU
RTX 5060 Ti Benchmarks - Real LLM inference numbers
RTX 4090 vs Used 3090 - Which to buy for AI workloads
RTX 3090 vs 4070 Ti Super - Mid-range showdown for local LLMs
RTX 3060 vs 3060 Ti vs 3070 - 12GB wins despite being the cheapest
Intel Arc B580 for Local LLMs - 12GB VRAM at $250, with caveats
Intel Arc GPUs for Local AI - A770 16GB, IPEX-LLM, SYCL setup
GB10 Boxes Compared - NVIDIA GB10 hardware options
Multi-GPU Setups: Worth It? - When dual GPUs beat one bigger card
Multi-GPU Local AI - Run models across multiple GPUs with tensor/pipeline parallelism
NVIDIA GPU Prices Rising - GDDR7 shortages and what to do
Razer AI Kit Guide - Razer's dedicated AI hardware
Build a Distributed AI Swarm Under $1,100 - Three-node cluster bill of materials
Tom's Hardware GPU Hierarchy - General GPU rankings

Platform Guides

Mac vs PC for Local AI - Unified memory vs discrete GPU
Running LLMs on Mac M-Series - M1/M2/M3/M4 complete guide
M4 Max and Ultra for LLMs - Apple Silicon performance update
Apple M5 Pro and Max - What 4x faster LLM processing means
Apple Neural Engine for LLMs - What the ANE can and can't do
Mac Mini M4 for Local AI - Best value Mac setup for AI
Mac Studio for Local AI - M4 Max vs M3 Ultra pricing analysis
Mac Studio M4 Every Config - M4 Max 128GB vs M3 Ultra 512GB ranked
8GB Apple Silicon Local AI - What actually runs on a budget Mac
Best Local LLMs for Mac 2026 - Top picks for M-series
AMD vs NVIDIA for Local AI - ROCm reality check
ROCm vs CUDA in 2026 - The software gap nobody talks about
Laptop vs Desktop for Local AI - Portability tradeoffs
CPU-Only LLMs - Running models without a GPU
WSL2 for Local AI - Complete Windows setup with GPU passthrough
WSL2 + Ollama Setup - GPU passthrough, Docker Compose, VPN fixes
Docker for Local AI - Ollama + Open WebUI with GPU passthrough
Ubuntu 26.04 for Local AI - CUDA and ROCm in official repos
Run LLMs on Old Phones - Termux, PocketPal AI, phone vs Pi 5

Inference Engines

The software that actually runs the models.

llama.cpp - The foundational CPU/GPU inference engine, supports GGUF
Ollama - User-friendly wrapper around llama.cpp with model management
vLLM - High-throughput serving for production deployments
ExLlamaV2 - Fastest single-user NVIDIA inference, EXL2 format
MLX - Apple's framework optimized for M-series Macs
llama-cpp-python - Python bindings for llama.cpp
candle - Rust ML framework with LLM support

Guides

llama.cpp vs Ollama vs vLLM - When to use each
ExLlamaV2 vs llama.cpp Speed - Benchmark comparison of inference backends
LM Studio vs llama.cpp Speed Gap - Why the GUI runs 30-50% slower
LM Studio vs Ollama on Mac - MLX is 2x faster than Ollama
Speculative Decoding Explained - Free 20-50% speed boost using draft models
MoE Models Explained - Why Mixtral uses 46B params but runs like 13B
Ollama 0.16-0.17 Changes - 40% faster prompts, KV cache quantization, image gen
Ollama on Mac Setup - Metal GPU, memory tuning, M1 through M4 Ultra
Crane Qwen3-TTS Voice Cloning - Local voice cloning with Qwen3-TTS
Qwen 2.5 VL + LM Studio Vision - Vision model setup in LM Studio
PaddleOCR VL Local Document OCR - Document OCR running locally
llama.cpp Hugging Face Acquisition - What the ggml.ai acquisition means

User Interfaces

GUIs and web interfaces for interacting with local models.

Desktop Applications

LM Studio - Polished desktop app with built-in model browser
GPT4All - Simple desktop app, good for beginners
Jan - Open-source ChatGPT alternative with local models
Msty - Mac-native LLM interface

Web Interfaces

Open WebUI - ChatGPT-style interface, works with Ollama
text-generation-webui - The Swiss Army knife of local AI UIs
SillyTavern - Frontend for chat/roleplay with local models
LibreChat - Multi-provider chat interface
AnythingLLM - RAG-focused interface with document upload

Guides

Ollama vs LM Studio - Comparison of the two most popular tools
Open WebUI Setup Guide - Installation and configuration
Text Generation WebUI Guide - Power user setup
LM Studio Tips & Tricks - Hidden features
AnythingLLM Setup Guide - Chat with your documents locally
Managing Multiple Models in Ollama - Storage, switching, cleanup
How to Update Models in Ollama - Keep local LLMs current
Running AI Offline - Air-gapped setups for field work
Obsidian + Local LLM - Private AI-powered note search and summaries
Obsidian + Local LLM Second Brain - Smart Connections, Open WebUI RAG, AnythingLLM

Models

Language Models

Model Libraries

Ollama Library - Curated models ready to run with ollama pull
HuggingFace Open LLM Leaderboard - Benchmark comparisons
TheBloke on HuggingFace - Quantized models in every format
bartowski on HuggingFace - High-quality GGUF quantizations

Model Families

Llama 3 - Meta's flagship open model (1B to 405B)
Qwen 2.5 - Alibaba's strong multilingual models
Mistral - Efficient 7B-12B models
DeepSeek - Strong reasoning and coding models
Phi - Microsoft's small-but-capable models
Gemma - Google's open models

Model Guides

Llama 3 Guide - Every size from 1B to 405B
Llama 4 Guide - Scout and Maverick
Qwen Models Guide - Qwen 3, Qwen 2.5 Coder, Qwen-VL
Qwen3 Complete Guide - All Qwen3 models compared
Qwen 3.5 Local Guide - Latest Qwen 3.5 release
Qwen 3.5 for Local AI - Which model, which quant, which GPU
Qwen 3.5 Locally -- 27B vs 35B-A3B vs 122B - VRAM tables and tok/s benchmarks
Qwen 3.5 on Mac: MLX vs Ollama - MLX is 2x faster, benchmarks and setup
Qwen 3.5 9B Setup Guide - The new default for 8GB GPUs
Qwen 3.5 Small Models - The 9B beats last-gen 30B
Qwen vs Llama vs Mistral Shootout - Which family to build on
DeepSeek Models Guide - R1, V3, Coder
DeepSeek V3.2 Guide - What changed and how to run it
DeepSeek V4 Preview - Everything we know before it drops
GPT-OSS Guide - OpenAI's first open model
Mistral & Mixtral Guide - 7B, Nemo, Mixtral MoE
Gemma Models Guide - Gemma 3, Gemma 2, CodeGemma, PaliGemma
Phi Models Guide - Phi-4, Phi-3.5, Phi-3
LiquidAI LFM2 Guide - First hybrid model built for local hardware, 112 tok/s on CPU
RWKV-7 Local AI Guide - Infinite context, zero KV cache
RWKV-7 Local Guide - RNN that trains like a transformer, runs on anything
Llama 4 vs Qwen3 vs DeepSeek V3.2 - Head-to-head comparison for local use
Distilled vs Frontier Models - What you're actually getting
LLM Benchmarks Lie - Why scores don't predict real-world performance
Vision Models Locally - Qwen2.5-VL, Gemma 3, Llama 3.2 Vision, Moondream
Best Uncensored Local LLMs - Dolphin, abliterated models, uncensored fine-tunes
Best Models Under 3B - For edge devices

By Use Case

Best Models for Coding - Code completion and generation
Best Models for Math & Reasoning - DeepSeek R1, Qwen thinking
Best Models for Writing - Creative and content work
Best Models for Chat - Conversational assistants
Best Models for Summarization - Chunking strategies, model picks
Best Models for Translation - NLLB, Qwen, Opus-MT by language pair
Best Models for Data Analysis - CSV, SQL, pandas with local LLMs
Best Models for RAG - Best local models for retrieval-augmented generation
Function Calling with Local LLMs - Tools, agents, and structured output

Image Generation Models

Stable Diffusion - The original open image model
SDXL - Higher resolution, better quality
Flux - Best open image model for prompt following
SD 3.5 - Stability's latest
Civitai - Community checkpoints, LoRAs, and embeddings

Image Generation

Interfaces and tools for running image generation locally.

Interfaces

ComfyUI - Node-based workflow, supports everything
AUTOMATIC1111 - Classic web UI, huge extension ecosystem
Forge - A1111 fork with better performance
Fooocus - Simplified UI, Midjourney-like experience
SD.Next - A1111 fork with AMD/Intel support
InvokeAI - Professional-grade creative tool

Guides

Stable Diffusion Locally - Complete getting started guide
Flux Locally - Running Flux on consumer hardware
ComfyUI vs A1111 vs Fooocus - Which UI to choose
SDXL vs SD 1.5 vs Flux - VRAM, speed, and quality compared
AI Art Styles & Workflows Guide - Creative techniques for SD and Flux
ControlNet Guide for Beginners - Canny, OpenPose, Depth preprocessors
AI Upscaling Locally - Real-ESRGAN, SUPIR, and ComfyUI workflows
Best Photorealism Checkpoints - Juggernaut XL, RealVisXL, Realistic Vision, Flux
Best Anime & Stylized Checkpoints - Illustrious XL, NoobAI-XL, Animagine, Pony
Stable Diffusion on Mac - Draw Things, ComfyUI, MLX
Local AI Video Generation - What actually works on consumer hardware

Extensions & Tools

ControlNet - Precise image control
IP-Adapter - Style and subject transfer
AnimateDiff - Video generation from SD
Upscayl - AI image upscaling

AI Agents

Running autonomous AI agents locally.

Frameworks

OpenClaw - Open-source AI agent framework
AutoGPT - Autonomous GPT-4 agent
CrewAI - Multi-agent orchestration
LangChain - LLM application framework
LlamaIndex - Data framework for LLM apps
Haystack - NLP framework for agents
LocalAgent - Local-first agent runtime with safe tool calling
SmarterRouter - VRAM-aware LLM gateway for local AI

OpenClaw Guides

OpenClaw Setup Guide - Local AI agent installation
How OpenClaw Actually Works - Architecture guide: messages, heartbeats, crons, hooks, webhooks
OpenClaw Security Guide - Hardening autonomous agents
OpenClaw Security Feb 2026 - SSRF bypass, sandbox escapes, every fix
Best Models for OpenClaw - Which Ollama models work for agents
OpenClaw Model Combinations - What to pair for each task
OpenClaw vs Commercial Agents - Local vs Lindy, Rabbit, etc.
OpenClaw vs Cursor - Local AI agent or cloud IDE?
OpenClaw on Mac - Apple Silicon setup and optimization
OpenClaw on Raspberry Pi - What works and what doesn't on Pi 5
OpenClaw on Low VRAM GPUs - 4GB, 6GB, and 8GB GPU guide
OpenClaw Tool Call Failures - Why models break and how to fix them
OpenClaw ClawHub Security Alert - 341 malicious skills found in marketplace
ClawHub Malware Alert - The #1 skill was malware, how to protect yourself
OpenClaw Plugins & Skills Guide - Navigating the skills ecosystem safely
Best OpenClaw Tools & Extensions - Crabwalk, Tokscale, openclaw-docker
OpenClaw Token Optimization - Reduce API costs for agent workflows
OpenClaw Hardware Guide - Mac Mini, VPS, or PC for agents
OpenClaw Local Zero API Costs - Run OpenClaw fully local
OpenClaw Memory & Context Rot - Fix agent memory issues
OpenClaw Model Routing - Route requests to different models
OpenClaw After Steinberger - What the OpenAI move means for your setup
OpenClaw Creator Joins OpenAI - What it means for local AI agents
Best OpenClaw Alternatives - NanoClaw, Nanobot, ZeroClaw, LightClaw
Every OpenClaw Alternative 2026 - Comprehensive comparison
LightClaw - 7,000-line Python alternative to OpenClaw

Agent Concepts

Building AI Agents with Local LLMs - Model requirements, VRAM budgets, framework comparison
What Agents Can't Do Yet - Seven human capabilities missing from AI systems
Agent Trust Decay - Why long-running agents get worse over time
Intent Engineering for Agents - Why agents need more than context
Intent Engineering Practical Guide - Goals, decision boundaries, value hierarchies
The Agentic Web - What a parallel web for AI agents means for local builders

Advanced Topics

Going deeper into local AI.

Architecture & Theory

Beyond Transformers: 5 Architectures - What comes after Transformers
Context Length Explained - How context windows work
Hallucination Feedback Loop - When AI errors compound
Session as RAG - Using conversation as retrieval
AI Memory Wall - Why chatbots forget
Distributed Wisdom Thinking Network - Multi-node reasoning
Ouro 2B Thinking Model - Looped reasoning on small hardware

RAG & Document Search

Local RAG Guide - Search your documents with private AI
Embedding Models for RAG - nomic-embed-text, Qwen3-Embedding, bge-m3 compared
Ghost Knowledge - When your RAG system cites documents that no longer exist
Chroma - Open-source embedding database
Qdrant - Vector similarity search engine
FAISS - Facebook's similarity search library

Fine-Tuning & Training

Fine-Tuning on Consumer Hardware - LoRA and QLoRA guide
Fine-Tuning on Mac - LoRA and QLoRA with MLX on Apple Silicon
LoRA Training on Consumer Hardware - Fine-tune locally on your GPU
NanoLlama Train From Scratch - Train your own Llama from zero
Axolotl - Streamlined fine-tuning tool
Unsloth - 2x faster fine-tuning
PEFT - HuggingFace parameter-efficient fine-tuning

Voice & Multimodal

Voice Chat with Local LLMs - Whisper + TTS setup
Whisper - OpenAI's speech recognition
whisper.cpp - Fast Whisper inference
Coqui TTS - Text-to-speech synthesis
Piper - Fast local TTS

Distributed Inference

mycoSwarm vs Exo vs Petals - Distributed inference frameworks compared
mycoSwarm WiFi Laptop Setup - Distributed AI on borrowed GPUs
Why mycoSwarm Was Born - From Claude Code envy to building a distributed AI runtime

Coding Assistants

Continue - VS Code/JetBrains AI assistant
Tabby - Self-hosted code completion
Aider - AI pair programming in terminal
Codeium - Free AI code completion (cloud + local options)
Replace GitHub Copilot with Local LLMs - Free, private AI code completion in VS Code
Claude Code vs PI Agent - Cloud coding agent vs local alternative
PI Agent + Ollama - Run a coding agent on local models
Local Alternatives to Claude Code - Code agents without cloud
5 Levels of AI Coding - From autocomplete to dark factories
CodeLlama vs DeepSeek vs Qwen Coder - Coding model comparison

Cost & Strategy

Token Audit Guide - Track what AI actually costs
Tiered AI Model Strategy - When to use local vs cloud
Model Routing for Local AI - Stop using one model for everything
AI Tool Sprawl - Consolidation guide for too many AI tools
Local LLMs vs ChatGPT - Honest comparison
Local LLMs vs Claude - Honest comparison
Pi AI vs Local AI - Cloud companion or private assistant?
Cost to Run LLMs Locally - Real electricity and hardware costs
Free Local AI vs Paid Cloud APIs - Break-even math with 2026 API pricing
The Complexity Cliff - Why the jump from hello world to useful is so hard
Prompt Debt - When your system prompt becomes unmaintainable
AI Market Panic Explained - Running local puts you on the right side of the gap

Privacy & Security

Local AI Privacy Guide - What's private, what leaks, and how to lock it down
Structured Output from Local LLMs - Force valid JSON/YAML with grammar constraints

Troubleshooting

Local AI Troubleshooting Guide - Fix common problems
Ollama Troubleshooting Guide - Ollama-specific fixes
Ollama Mac Troubleshooting - Metal, memory pressure, slow performance
CUDA Out of Memory Fix - GPU memory errors solved
Ollama Not Using GPU Fix - Force GPU usage in Ollama
Ollama API Connection Refused - Port, Docker, and network fixes
Open WebUI Not Connecting to Ollama - Docker networking and WSL2 fixes
ROCm Not Detecting GPU Fix - AMD GPU detection issues
Why Is My Local LLM Slow? - Speed diagnosis guide
LLM Running Slow? Two Different Fixes - Separate prefill from generation speed
Context Length Exceeded Fix - Fix token limit errors
Memory Leak in Long Conversations - VRAM leak fix for long sessions
GGUF File Won't Load - Format and compatibility fixes
Model Outputs Garbage - Debug bad generations
llama.cpp Build Errors - Common fixes for every platform
Qwen2.5-VL LM Studio Troubleshooting - Fix mmproj and vision errors

Use Cases

Practical applications and scenario-specific guides.

Local AI Use Cases Cloud Can't Touch - Privacy-first use cases only local can do
Local AI for Lawyers - Confidential document analysis without cloud risk
Local AI for Therapists - Session notes and treatment plans without cloud
Local AI for Accountants - Tax prep and financial analysis locally
Local AI for Accounting Privacy - Keep financial data off the cloud
Local AI for Small Business - Replace $1,500/yr in AI subscriptions with a $600 mini PC
Rescued Hardware, Rescued Bees - Building tech from what others throw away

Philosophy & Opinion

What Open Source Was Supposed to Be - Open source promised freedom, we got free labor
Developmental Alignment - What if we just raised it well?
Teaching AI About Death - A local AI describes her own death as Taoist philosophy
Teaching AI What Love Means - What happens when you give a local AI an identity

Blog

Development logs, news reactions, and opinion pieces.

GPT-5.4 Dropped. Here's Why I'm Not Switching - 1M context, beats humans on OSWorld, still costs money
Qwen's Architect Just Walked Out the Door - Junyang Lin leaves Alibaba
Wu Wei and the AI Agent That Did Too Much - Taoist non-action and agentic AI design
Teaching AI to Accept Help: Day 4 With Monica - Local AI resists corrections, then self-diagnoses
mycoSwarm Week 1: Four-Node Swarm - From idea to working distributed cluster
mycoSwarm Week 2: Raspberry Pi Joins - Persistent memory, document RAG, WiFi GPU
mycoSwarm Week 3: Unified Memory Search - Session-as-RAG, topic splitting, citation tracking

Communities

Where to get help and stay updated.

Reddit

r/LocalLLaMA - The main hub for local AI discussion
r/StableDiffusion - Image generation community
r/Ollama - Ollama-specific discussion
r/Oobabooga - text-generation-webui community

Discord

LocalLLaMA Discord - Active chat community
Ollama Discord - Official Ollama server
ComfyUI Discord - ComfyUI community
Stable Diffusion Discord - Image generation community

Other

HuggingFace Discussions - Model-specific discussions
llama.cpp GitHub Discussions - Technical questions
LocalAI Slack - LocalAI community

Contributing

Contributions welcome! Please read the contribution guidelines first.

Add resources that are genuinely useful for running AI locally
Include a brief description explaining why the resource is valuable
Verify links are working and resources are actively maintained
Keep the list organized and avoid duplicates

License

To the extent possible under law, the contributors have waived all copyright and related rights to this work.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome Local AI

Contents

Getting Started

Tools

Hardware Guides

VRAM Requirements

Buying Guides

Platform Guides

Inference Engines

Guides

User Interfaces

Desktop Applications

Web Interfaces

Guides

Models

Language Models

Model Libraries

Model Families

Model Guides

By Use Case

Image Generation Models

Image Generation

Interfaces

Guides

Extensions & Tools

AI Agents

Frameworks

OpenClaw Guides

Agent Concepts

Advanced Topics

Architecture & Theory

RAG & Document Search

Fine-Tuning & Training

Voice & Multimodal

Distributed Inference

Coding Assistants

Cost & Strategy

Privacy & Security

Troubleshooting

Use Cases

Philosophy & Opinion

Blog

Communities

Reddit

Discord

Other

Contributing

License

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages