Skip to content

VL-Embedding-2B可用,8B加载失败 #48

@wincle

Description

@wincle

vllm版本:0.11.0
报错信息如下:

/usr/local/lib/python3.11/site-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 01-28 19:04:16 [init.py:216] Automatically detected platform cuda.
Modular Diffusers is currently an experimental feature under active development. The API is subject to breaking changes in future releases.
Loading model from /ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B...
INFO 01-28 19:04:23 [utils.py:233] non-default args: {'runner': 'pooling', 'trust_remote_code': True, 'dtype': 'bfloat16', 'model': '/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B'}
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
INFO 01-28 19:04:32 [model.py:833] Resolved --convert auto to --convert embed. Pass the value explicitly to silence this message.
INFO 01-28 19:04:32 [model.py:547] Resolved architecture: Qwen3VLForConditionalGeneration
torch_dtype is deprecated! Use dtype instead!
INFO 01-28 19:04:32 [model.py:1510] Using max model len 262144
INFO 01-28 19:04:32 [arg_utils.py:1575] (Enabling) chunked prefill by default
INFO 01-28 19:04:32 [arg_utils.py:1578] (Enabling) prefix caching by default
INFO 01-28 19:04:32 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
WARNING 01-28 19:04:33 [init.py:3036] We must use the spawn multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reasons: CUDA is initialized
/usr/local/lib/python3.11/site-packages/torch/cuda/init.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
INFO 01-28 19:04:36 [init.py:216] Automatically detected platform cuda.
Modular Diffusers is currently an experimental feature under active development. The API is subject to breaking changes in future releases.
(EngineCore_DP0 pid=5625) INFO 01-28 19:04:42 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=5625) INFO 01-28 19:04:42 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B', speculative_config=None, tokenizer='/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=PoolerConfig(pooling_type='LAST', normalize=None, dimensions=None, enable_chunked_processing=None, max_embed_len=None, activation=None, logit_bias=None, softmax=None, step_tag_id=None, returned_token_ids=None), compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=5625) WARNING 01-28 19:04:42 [init.py:763] The environment variable HOST_IP is deprecated and ignored, as it is often used by Docker and other software to interact with the container's network stack. Please use VLLM_HOST_IP instead to set the IP address for vLLM processes to communicate with each other.
[W128 19:05:00.722262531 socket.cpp:755] [c10d] The client socket cannot be initialized to connect to [14-1-11-13.dl2-prod-instance-187.ide.svc.cluster.local]:53617 (errno: 97 - Address family not supported by protocol).
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:00 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:00 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling.
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [gpu_model_runner.py:2602] Starting to load model /ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B...
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [cuda.py:366] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ST projector loading failed
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] Traceback (most recent call last):
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/adapters.py", line 40, in _load_st_projector
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] modules = get_hf_file_to_dict("modules.json", model_config.model,
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/vllm/transformers_utils/config.py", line 687, in get_hf_file_to_dict
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] hf_hub_file = hf_hub_download(model, file_name, revision=revision)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 531, in _file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] return file_download(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/file_download.py", line 89, in model_file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] return _repo_file_download(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/file_download.py", line 217, in _repo_file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] endpoint = _api.get_endpoint_for_read(repo_id=repo_id, repo_type=repo_type)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/api.py", line 522, in get_endpoint_for_read
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] if not self.repo_exists(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] File "/usr/local/lib/python3.11/site-packages/modelscope/hub/api.py", line 670, in repo_exists
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] raise Exception('Invalid repo_id: %s, must be of format namespace/name' % repo_type)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] Exception: Invalid repo_id: model, must be of format namespace/name
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:35<01:47, 35.88s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [01:11<01:11, 35.68s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [01:47<00:35, 35.86s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:58<00:00, 26.02s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:58<00:00, 29.61s/it]
(EngineCore_DP0 pid=5625)
(EngineCore_DP0 pid=5625) INFO 01-28 19:07:06 [default_loader.py:267] Loading weights took 118.45 seconds
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in init
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self._init_executor()
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.collective_rpc("load_model")
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/utils/init.py", line 3122, in run_method
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] return func(*args, **kwargs)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.model = model_loader.load_model(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] self.load_weights(model, model_config)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] raise ValueError("Following weights were not initialized from "
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ValueError: Following weights were not initialized from checkpoint: {'language_model.lm_head.weight'}
(EngineCore_DP0 pid=5625) Process EngineCore_DP0:
(EngineCore_DP0 pid=5625) Traceback (most recent call last):
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=5625) self.run()
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=5625) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=5625) raise e
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=5625) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in init
(EngineCore_DP0 pid=5625) super().init(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in init
(EngineCore_DP0 pid=5625) self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 54, in init
(EngineCore_DP0 pid=5625) self._init_executor()
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=5625) self.collective_rpc("load_model")
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=5625) return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/utils/init.py", line 3122, in run_method
(EngineCore_DP0 pid=5625) return func(*args, **kwargs)
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=5625) self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=5625) self.model = model_loader.load_model(
(EngineCore_DP0 pid=5625) ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=5625) self.load_weights(model, model_config)
(EngineCore_DP0 pid=5625) File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(EngineCore_DP0 pid=5625) raise ValueError("Following weights were not initialized from "
(EngineCore_DP0 pid=5625) ValueError: Following weights were not initialized from checkpoint: {'language_model.lm_head.weight'}
[rank0]:[W128 19:07:07.772602278 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
File "/ProjectRoot/Qwen3-VL/embedding/infer_embed.py", line 164, in
main()
File "/ProjectRoot/Qwen3-VL/embedding/infer_embed.py", line 136, in main
llm = LLM(**vars(engine_args))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 297, in init
self.llm_engine = LLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
return cls(vllm_config=vllm_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 114, in init
self.engine_core = EngineCoreClient.make_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 602, in init
super().init(
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 448, in init
with launch_core_engines(vllm_config, executor_class,
File "/usr/local/lib/python3.11/contextlib.py", line 144, in exit
next(self.gen)
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
wait_for_engine_startup(
File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions