# llm-fitsizer
Calculate maximum quantized LLM size fitting available RAM
npm install llm-fitsizer
```typescript
import { calculateMaxModel } from 'llm-fitsizer';
const available = 16 * 1024; // 16GB RAM in MB
const result = calculateMaxModel(available, {
quantization: 'Q4_K_M',
contextSize: 4096,
overhead: 0.2
});
console.log(result);
// { maxParams: 13, quant: 'Q4_K_M', estimatedUsage: 14336 }
Accounts for context window, KV cache, and runtime overhead. Default assumes 20% overhead for OS and inference engine.
Supported quantization formats:
- Q2_K, Q3_K_S, Q3_K_M, Q3_K_L
- Q4_0, Q4_1, Q4_K_S, Q4_K_M
- Q5_0, Q5_1, Q5_K_S, Q5_K_M
- Q6_K, Q8_0
- F16, F32
import { estimateMemory, getBestFit } from 'llm-fitsizer';
// estimate specific model
const usage = estimateMemory({
params: 70,
quantization: 'Q4_K_M',
contextSize: 8192,
batchSize: 512
});
// find best quantization for available RAM
const best = getBestFit(24 * 1024, {
params: 34,
contextSize: 4096,
minQuantization: 'Q4_K_M' // don't go lower quality
});
// calculate for multi-GPU setup
const distributed = calculateMaxModel(48 * 1024, {
gpuCount: 2,
gpuOffloadLayers: 32
});MIT
<!-- fix tweak -->
<!-- feat: small addition -->