vllm推理modelslim量化后的qwen3模型报错

### Your current environment

模型qwen3-0.6b，具体量化参数如下：

python /home/ma-user/work/msit/msmodelslim/quant_qwen.py \
          --model_path /home/ma-user/work/checkpoint-906 \
          --save_directory /home/ma-user/work/Quant/output_IFD_906 \
          --device_type npu \
          --model_type qwen3 \
          --calib_file /home/ma-user/work/Quant/IFD_cali.jsonl \
          --w_bit 8 \
          --a_bit 8 \
          --group_size 256 \
          --trust_remote_code True

在910A上，我使用modelslim量化了qwen模型，在使用vllm进行推理时报错。我检查了量化代码生成的模型文件，并没有缺少。报错中为何还提示缺少quant_config？

推理代码如下：

import torch

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The future of AI is",
]
sampling_params = SamplingParams(temperature=0.6, top_p=0.95, top_k=40)

llm = LLM(model="/home/ma-user/work/Quant/output_IFD",
          max_model_len=2048,
          enforce_eager=True,
          trust_remote_code=True,
          # Enable quantization by specifying `quantization="ascend"`
          quantization="ascend")

outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
 
报错内容如下：
(python-3.9.10) [ma-user work]$python /home/ma-user/work/Quant/competition_model_GPTQ.py
INFO 08-07 16:25:29 [__init__.py:30] Available plugins for group vllm.platform_plugins:
INFO 08-07 16:25:29 [__init__.py:32] name=ascend, value=vllm_ascend:register
INFO 08-07 16:25:29 [__init__.py:34] all available plugins for group vllm.platform_plugins will be loaded.
INFO 08-07 16:25:29 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 08-07 16:25:29 [__init__.py:44] plugin ascend loaded.
INFO 08-07 16:25:29 [__init__.py:230] Platform plugin ascend is activated
WARNING:root:Warning: Failed to register custom ops, all custom ops will be disabled
INFO 08-07 16:25:31 [__init__.py:30] Available plugins for group vllm.general_plugins:
INFO 08-07 16:25:31 [__init__.py:32] name=ascend_enhanced_model, value=vllm_ascend:register_model
INFO 08-07 16:25:31 [__init__.py:34] all available plugins for group vllm.general_plugins will be loaded.
INFO 08-07 16:25:31 [__init__.py:36] set environment variable VLLM_PLUGINS to control which plugins to load.
INFO 08-07 16:25:31 [__init__.py:44] plugin ascend_enhanced_model loaded.
INFO 08-07 16:25:31 [patch_tritonplaceholder.py:33] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 08-07 16:25:31 [patch_tritonplaceholder.py:46] Triton is not installed. Using dummy decorators. Install it via `pip install triton` to enable kernel compilation.
INFO 08-07 16:25:31 [patch_tritonplaceholder.py:71] Triton module has been replaced with a placeholder.
WARNING 08-07 16:25:31 [_custom_ops.py:21] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
WARNING 08-07 16:25:33 [registry.py:380] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 08-07 16:25:33 [registry.py:380] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:CustomQwen2VLForConditionalGeneration.
WARNING 08-07 16:25:33 [registry.py:380] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 08-07 16:25:33 [registry.py:380] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
INFO 08-07 16:25:47 [config.py:689] This model supports multiple tasks: {'embed', 'generate', 'classify', 'score', 'reward'}. Defaulting to 'generate'.
WARNING 08-07 16:25:47 [config.py:768] ascend quantization is not fully optimized yet. The speed can be slower than non-quantized models.
INFO 08-07 16:25:47 [arg_utils.py:1742] npu is experimental on VLLM_USE_V1=1. Falling back to V0 Engine.
INFO 08-07 16:25:47 [config.py:1747] Disabled the custom all-reduce kernel because it is not supported on current platform.
Traceback (most recent call last):
  File "/home/ma-user/work/Quant/competition_model_GPTQ.py", line 242, in <module>
    comp = Competition()
  File "/home/ma-user/work/Quant/competition_model_GPTQ.py", line 35, in __init__
    self.llm = LLM(
  File "/home/ma-user/work/requirements/vllm/vllm/utils.py", line 1099, in inner
    return fn(*args, **kwargs)
  File "/home/ma-user/work/requirements/vllm/vllm/entrypoints/llm.py", line 248, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
  File "/home/ma-user/work/requirements/vllm/vllm/engine/llm_engine.py", line 515, in from_engine_args
    vllm_config = engine_args.create_engine_config(usage_context)
  File "/home/ma-user/work/requirements/vllm/vllm/engine/arg_utils.py", line 1335, in create_engine_config
    config = VllmConfig(
  File "<string>", line 19, in __init__
  File "/home/ma-user/work/requirements/vllm/vllm/config.py", line 3709, in __post_init__
    self.quant_config = VllmConfig._get_quantization_config(
  File "/home/ma-user/work/requirements/vllm/vllm/config.py", line 3651, in _get_quantization_config
    quant_config = get_quant_config(model_config, load_config)
  File "/home/ma-user/work/requirements/vllm/vllm/model_executor/model_loader/weight_utils.py", line 195, in get_quant_config
    return quant_cls()
TypeError: __init__() missing 1 required positional argument: 'quant_config'
[ERROR] 2025-08-07-16:25:48 (PID:258515, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception

### How would you like to use vllm on ascend

I want to run inference of a [specific model](put link here). I don't know how to integrate it with vllm.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vllm推理modelslim量化后的qwen3模型报错 #2290

Your current environment

How would you like to use vllm on ascend

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vllm推理modelslim量化后的qwen3模型报错 #2290

Description

Your current environment

How would you like to use vllm on ascend

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions