-
Notifications
You must be signed in to change notification settings - Fork 78
Description
vllm版本:0.11.0
报错信息如下:
/usr/local/lib/python3.11/site-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 01-28 19:04:16 [init.py:216] Automatically detected platform cuda.
Modular Diffusers is currently an experimental feature under active development. The API is subject to breaking changes in future releases.
Loading model from /ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B...
INFO 01-28 19:04:23 [utils.py:233] non-default args: {'runner': 'pooling', 'trust_remote_code': True, 'dtype': 'bfloat16', 'model': '/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B'}
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
INFO 01-28 19:04:32 [model.py:833] Resolved --convert auto to --convert embed. Pass the value explicitly to silence this message.
INFO 01-28 19:04:32 [model.py:547] Resolved architecture: Qwen3VLForConditionalGeneration
torch_dtype is deprecated! Use dtype instead!
INFO 01-28 19:04:32 [model.py:1510] Using max model len 262144
INFO 01-28 19:04:32 [arg_utils.py:1575] (Enabling) chunked prefill by default
INFO 01-28 19:04:32 [arg_utils.py:1578] (Enabling) prefix caching by default
INFO 01-28 19:04:32 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
WARNING 01-28 19:04:33 [init.py:3036] We must use the spawn multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reasons: CUDA is initialized
/usr/local/lib/python3.11/site-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 01-28 19:04:36 [init.py:216] Automatically detected platform cuda.
Modular Diffusers is currently an experimental feature under active development. The API is subject to breaking changes in future releases.
(EngineCore_DP0 pid=5625) INFO 01-28 19:04:42 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=5625) INFO 01-28 19:04:42 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B', speculative_config=None, tokenizer='/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=PoolerConfig(pooling_type='LAST', normalize=None, dimensions=None, enable_chunked_processing=None, max_embed_len=None, activation=None, logit_bias=None, softmax=None, step_tag_id=None, returned_token_ids=None), compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=5625) WARNING 01-28 19:04:42 [init.py:763] The environment variable HOST_IP is deprecated and ignored, as it is often used by Docker and other software to interact with the container's network stack. Please use VLLM_HOST_IP instead to set the IP address for vLLM processes to communicate with each other.
[W128 19:05:00.722262531 socket.cpp:755] [c10d] The client socket cannot be initialized to connect to [14-1-11-13.dl2-prod-instance-187.ide.svc.cluster.local]:53617 (errno: 97 - Address family not supported by protocol).
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:00 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:00 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling.
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [gpu_model_runner.py:2602] Starting to load model /ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B...
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [cuda.py:366] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ST projector loading failed
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] Traceback (most recent call last):
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/adapters.py", line 40, in _load_st_projector
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] modules = get_hf_file_to_dict("modules.json", model_config.model,
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/vllm/transformers_utils/config.py", line 687, in get_hf_file_to_dict
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] hf_hub_file = hf_hub_download(model, file_name, revision=revision)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 531, in _file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] return file_download(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/file_download.py", line 89, in model_file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] return _repo_file_download(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/file_download.py", line 217, in _repo_file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] endpoint = _api.get_endpoint_for_read(repo_id=repo_id, repo_type=repo_type)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/api.py", line 522, in get_endpoint_for_read
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] if not self.repo_exists(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/api.py", line 670, in repo_exists
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] raise Exception('Invalid repo_id: %s, must be of format namespace/name' % repo_type)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] Exception: Invalid repo_id: model, must be of format namespace/name
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:35<01:47, 35.88s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [01:11<01:11, 35.68s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [01:47<00:35, 35.86s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:58<00:00, 26.02s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:58<00:00, 29.61s/it]
(EngineCore_DP0 pid=5625)
(EngineCore_DP0 pid=5625) INFO 01-28 19:07:06 [default_loader.py:267] Loading weights took 118.45 seconds
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in init
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.collective_rpc("load_model")
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/utils/init.py", line 3122, in run_method
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.model = model_loader.load_model(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.load_weights(model, model_config)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] raise ValueError("Following weights were not initialized from "
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ValueError: Following weights were not initialized from checkpoint: {'language_model.lm_head.weight'}
(EngineCore_DP0 pid=5625) Process EngineCore_DP0:
(EngineCore_DP0 pid=5625) Traceback (most recent call last):
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=5625) self.run()
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=5625) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=5625) raise e
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=5625) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=5625) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in init
(EngineCore_DP0 pid=5625) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=5625) self._init_executor()
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=5625) self.collective_rpc("load_model")
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=5625) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/utils/init.py", line 3122, in run_method
(EngineCore_DP0 pid=5625) return func(*args, **kwargs)
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=5625) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=5625) self.model = model_loader.load_model(
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=5625) self.load_weights(model, model_config)
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(EngineCore_DP0 pid=5625) raise ValueError("Following weights were not initialized from "
(EngineCore_DP0 pid=5625) ValueError: Following weights were not initialized from checkpoint: {'language_model.lm_head.weight'}
[rank0]:[W128 19:07:07.772602278 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
File "/ProjectRoot/Qwen3-VL/embedding/infer_embed.py", line 164, in
main()
File "/ProjectRoot/Qwen3-VL/embedding/infer_embed.py", line 136, in main
llm = LLM(**vars(engine_args))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 297, in init
self.llm_engine = LLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
return cls(vllm_config=vllm_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 114, in init
self.engine_core = EngineCoreClient.make_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 602, in init
super().init(
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 448, in init
with launch_core_engines(vllm_config, executor_class,
File "/usr/local/lib/python3.11/contextlib.py", line 144, in exit
next(self.gen)
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
wait_for_engine_startup(
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}