Skip to content

2aronS/llm-fitsizer

Repository files navigation

# llm-fitsizer

Calculate maximum quantized LLM size fitting available RAM

npm install llm-fitsizer


```typescript
import { calculateMaxModel } from 'llm-fitsizer';

const available = 16 * 1024; // 16GB RAM in MB
const result = calculateMaxModel(available, {
  quantization: 'Q4_K_M',
  contextSize: 4096,
  overhead: 0.2
});

console.log(result);
// { maxParams: 13, quant: 'Q4_K_M', estimatedUsage: 14336 }

notes

Accounts for context window, KV cache, and runtime overhead. Default assumes 20% overhead for OS and inference engine.

Supported quantization formats:

  • Q2_K, Q3_K_S, Q3_K_M, Q3_K_L
  • Q4_0, Q4_1, Q4_K_S, Q4_K_M
  • Q5_0, Q5_1, Q5_K_S, Q5_K_M
  • Q6_K, Q8_0
  • F16, F32
import { estimateMemory, getBestFit } from 'llm-fitsizer';

// estimate specific model
const usage = estimateMemory({
  params: 70,
  quantization: 'Q4_K_M',
  contextSize: 8192,
  batchSize: 512
});

// find best quantization for available RAM
const best = getBestFit(24 * 1024, {
  params: 34,
  contextSize: 4096,
  minQuantization: 'Q4_K_M' // don't go lower quality
});

// calculate for multi-GPU setup
const distributed = calculateMaxModel(48 * 1024, {
  gpuCount: 2,
  gpuOffloadLayers: 32
});

MIT

<!-- fix tweak -->
<!-- feat: small addition -->

About

Calculate maximum quantized LLM size fitting available RAM

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors