-
Notifications
You must be signed in to change notification settings - Fork 350
Description
Your current environment
模型qwen3-0.6b,具体量化参数如下:
python /home/ma-user/work/msit/msmodelslim/quant_qwen.py
--model_path /home/ma-user/work/checkpoint-906
--save_directory /home/ma-user/work/Quant/output_IFD_906
--device_type npu
--model_type qwen3
--calib_file /home/ma-user/work/Quant/IFD_cali.jsonl
--w_bit 8
--a_bit 8
--group_size 256
--trust_remote_code True
在910A上,我使用modelslim量化了qwen模型,在使用vllm进行推理时报错。我检查了量化代码生成的模型文件,并没有缺少。报错中为何还提示缺少quant_config?
推理代码如下:
import torch
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=40)
llm = LLM(model="/home/ma-user/work/Quant/output_IFD",
max_model_len=2048,
enforce_eager=True,
trust_remote_code=True,
# Enable quantization by specifying quantization="ascend"
quantization="ascend")
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
报错内容如下:
(python-3.9.10) [ma-user work]$python /home/ma-user/work/Quant/competition_model_GPTQ.py
INFO 08-07 16:25:29 [init.py:30] Available plugins for group vllm.platform_plugins:
INFO 08-07 16:25:29 [init.py:32] name=ascend, value=vllm_ascend:register
INFO 08-07 16:25:29 [init.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 08-07 16:25:29 [init.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 08-07 16:25:29 [init.py:44] plugin ascend loaded.
INFO 08-07 16:25:29 [init.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 08-07 16:25:31 [init.py:30] Available plugins for group vllm.general_plugins:
INFO 08-07 16:25:31 [init.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 08-07 16:25:31 [init.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 08-07 16:25:31 [init.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 08-07 16:25:31 [init.py:44] plugin ascend_enhanced_model loaded.
INFO 08-07 16:25:31 [patch_tritonplaceholder.py:33] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 08-07 16:25:31 [patch_tritonplaceholder.py:46] Triton is not installed. Using dummy decorators. Install it via pip install triton
to enable kernel compilation.
INFO 08-07 16:25:31 [patch_tritonplaceholder.py:71] Triton module has been replaced with a placeholder.
WARNING 08-07 16:25:31 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-07 16:25:33 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 08-07 16:25:33 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 08-07 16:25:33 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 08-07 16:25:33 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 08-07 16:25:47 [config.py:689] This model supports multiple tasks: {'embed', 'generate', 'classify', 'score', 'reward'}. Defaulting to 'generate'.
WARNING 08-07 16:25:47 [config.py:768] ascend quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 08-07 16:25:47 [arg_utils.py:1742] npu is experimental on VLLM_USE_V1=1. Falling back to V0 Engine.
INFO 08-07 16:25:47 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
Traceback (most recent call last):
File "/home/ma-user/work/Quant/competition_model_GPTQ.py", line 242, in
comp = Competition()
File "/home/ma-user/work/Quant/competition_model_GPTQ.py", line 35, in init
self.llm = LLM(
File "/home/ma-user/work/requirements/vllm/vllm/utils.py", line 1099, in inner
return fn(*args, **kwargs)
File "/home/ma-user/work/requirements/vllm/vllm/entrypoints/llm.py", line 248, in init
self.llm_engine = LLMEngine.from_engine_args(
File "/home/ma-user/work/requirements/vllm/vllm/engine/llm_engine.py", line 515, in from_engine_args
vllm_config = engine_args.create_engine_config(usage_context)
File "/home/ma-user/work/requirements/vllm/vllm/engine/arg_utils.py", line 1335, in create_engine_config
config = VllmConfig(
File "", line 19, in init
File "/home/ma-user/work/requirements/vllm/vllm/config.py", line 3709, in post_init
self.quant_config = VllmConfig._get_quantization_config(
File "/home/ma-user/work/requirements/vllm/vllm/config.py", line 3651, in _get_quantization_config
quant_config = get_quant_config(model_config, load_config)
File "/home/ma-user/work/requirements/vllm/vllm/model_executor/model_loader/weight_utils.py", line 195, in get_quant_config
return quant_cls()
TypeError: init() missing 1 required positional argument: 'quant_config'
[ERROR] 2025-08-07-16:25:48 (PID:258515, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
How would you like to use vllm on ascend
I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.