HabanaAI · sureshnam · Sep 22, 2025 · Sep 23, 2025 · Sep 23, 2025 · Sep 23, 2025
@@ -0,0 +1 @@
+{"method": "HOOKS","mode": "MEASURE","observer": "maxabs","allowlist": {"types": [], "names":  []},"blocklist": {"types": [], "names":  []},"quantize_weight": false,"dump_stats_path": "/eager_output/llama-4-maverick-17b-128e-instruct/g3/inc_output"}
@@ -0,0 +1 @@
+{"mode": "QUANTIZE","observer": "maxabs","scale_method": "maxabs_hw","allowlist": {"types": [],"names": []},"blocklist": {"types": [],"names": []},"dump_stats_path": "./g3/inc_output"}
@@ -0,0 +1,41 @@
+# Static Quantization
+The below steps are given for an model Llama-4-Maverick-17B-128E-Instruct as example
+
+## Configuration
+
+1. Locate the file `maxabs_quant_g3.json` inside the `model quantization` folder.
+2. Edit it and set the parameter `dump_stats_path` to the absolute path where the repository is cloned.
+
+Example:
+
+```json
+"dump_stats_path": "/root/vllm-fork/.static_quant/1.22.0/Llama-4-Maverick-17B-128E-Instruct/g3/inc_output"
+```
+
+## Environment Variable
+Export the environment variable QUANT_CONFIG before running the server. It must point to the location of maxabs_quant_g3.json.
+
+Example:
+
+```bash 
+export QUANT_CONFIG='/root/vllm-fork/.static_quant/1.22.0/Llama-4-Maverick-17B-128E-Instruct/maxabs_quant_g3.json'
+```
+
+## Run vLLM Server
+
+Start the vLLM server with quantization enabled:
+
+```bash 
+vllm serve meta-llama/Llama-4-Maverick-17B-128E-Instruct \
+  --quantization inc \
+  --kv-cache-dtype fp8_inc \
+  --weights-load-device cpu \
+  --tensor-parallel-size 8 \
+  --max-model-len 2048
+```
+
+## Notes
+
+1. The dump_stats_path in maxabs_quant_g3.json must be an absolute path.
+2. QUANT_CONFIG must be exported before running vllm serve.
+3. Adjust --tensor-parallel-size and --max-model-len according to your system resources.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"method": "HOOKS","mode": "MEASURE","observer": "maxabs","allowlist": {"types": [], "names": []},"blocklist": {"types": [], "names": []},"quantize_weight": false,"dump_stats_path": "/eager_output/llama-4-maverick-17b-128e-instruct/g3/inc_output"}
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		{"mode": "QUANTIZE","observer": "maxabs","scale_method": "maxabs_hw","allowlist": {"types": [],"names": []},"blocklist": {"types": [],"names": []},"dump_stats_path": "./g3/inc_output"}