VL-Embedding-2B可用，8B加载失败

vllm版本：0.11.0
报错信息如下：

/usr/local/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
INFO 01-28 19:04:16 [__init__.py:216] Automatically detected platform cuda.
Modular Diffusers is currently an experimental feature under active development. The API is subject to breaking changes in future releases.
Loading model from /ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B...
INFO 01-28 19:04:23 [utils.py:233] non-default args: {'runner': 'pooling', 'trust_remote_code': True, 'dtype': 'bfloat16', 'model': '/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B'}
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
INFO 01-28 19:04:32 [model.py:833] Resolved `--convert auto` to `--convert embed`. Pass the value explicitly to silence this message.
INFO 01-28 19:04:32 [model.py:547] Resolved architecture: Qwen3VLForConditionalGeneration
`torch_dtype` is deprecated! Use `dtype` instead!
INFO 01-28 19:04:32 [model.py:1510] Using max model len 262144
INFO 01-28 19:04:32 [arg_utils.py:1575] (Enabling) chunked prefill by default
INFO 01-28 19:04:32 [arg_utils.py:1578] (Enabling) prefix caching by default
INFO 01-28 19:04:32 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
WARNING 01-28 19:04:33 [__init__.py:3036] We must use the `spawn` multiprocessing start method. Overriding VLLM_WORKER_MULTIPROC_METHOD to 'spawn'. See https://docs.vllm.ai/en/latest/usage/troubleshooting.html#python-multiprocessing for more information. Reasons: CUDA is initialized
/usr/local/lib/python3.11/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
INFO 01-28 19:04:36 [__init__.py:216] Automatically detected platform cuda.
Modular Diffusers is currently an experimental feature under active development. The API is subject to breaking changes in future releases.
(EngineCore_DP0 pid=5625) INFO 01-28 19:04:42 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=5625) INFO 01-28 19:04:42 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B', speculative_config=None, tokenizer='/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=PoolerConfig(pooling_type='LAST', normalize=None, dimensions=None, enable_chunked_processing=None, max_embed_len=None, activation=None, logit_bias=None, softmax=None, step_tag_id=None, returned_token_ids=None), compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":1,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=5625) WARNING 01-28 19:04:42 [__init__.py:763] The environment variable HOST_IP is deprecated and ignored, as it is often used by Docker and other software to interact with the container's network stack. Please use VLLM_HOST_IP instead to set the IP address for vLLM processes to communicate with each other.
[W128 19:05:00.722262531 socket.cpp:755] [c10d] The client socket cannot be initialized to connect to [14-1-11-13.dl2-prod-instance-187.ide.svc.cluster.local]:53617 (errno: 97 - Address family not supported by protocol).
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:00 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:00 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling.
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [gpu_model_runner.py:2602] Starting to load model /ProjectRoot/Qwen3-VL/embedding/Qwen/Qwen3-VL-Embedding-8B...
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=5625) INFO 01-28 19:05:07 [cuda.py:366] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] ST projector loading failed
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] Traceback (most recent call last):
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/models/adapters.py", line 40, in _load_st_projector
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     modules = get_hf_file_to_dict("modules.json", model_config.model,
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/vllm/transformers_utils/config.py", line 687, in get_hf_file_to_dict
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     hf_hub_file = hf_hub_download(model, file_name, revision=revision)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/modelscope/utils/hf_util/patcher.py", line 531, in _file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     return file_download(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]            ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/modelscope/hub/file_download.py", line 89, in model_file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     return _repo_file_download(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]            ^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/modelscope/hub/file_download.py", line 217, in _repo_file_download
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     endpoint = _api.get_endpoint_for_read(repo_id=repo_id, repo_type=repo_type)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/modelscope/hub/api.py", line 522, in get_endpoint_for_read
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     if not self.repo_exists(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]            ^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]   File "/usr/local/lib/python3.11/site-packages/modelscope/hub/api.py", line 670, in repo_exists
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78]     raise Exception('Invalid repo_id: %s, must be of format namespace/name' % repo_type)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:05:08 [adapters.py:78] Exception: Invalid repo_id: model, must be of format namespace/name
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:35<01:47, 35.88s/it]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [01:11<01:11, 35.68s/it]
Loading safetensors checkpoint shards:  75% Completed | 3/4 [01:47<00:35, 35.86s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:58<00:00, 26.02s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:58<00:00, 29.61s/it]
(EngineCore_DP0 pid=5625) 
(EngineCore_DP0 pid=5625) INFO 01-28 19:07:06 [default_loader.py:267] Loading weights took 118.45 seconds
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     self.collective_rpc("load_model")
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     self.model = model_loader.load_model(
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     self.load_weights(model, model_config)
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]   File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708]     raise ValueError("Following weights were not initialized from "
(EngineCore_DP0 pid=5625) ERROR 01-28 19:07:07 [core.py:708] ValueError: Following weights were not initialized from checkpoint: {'language_model.lm_head.weight'}
(EngineCore_DP0 pid=5625) Process EngineCore_DP0:
(EngineCore_DP0 pid=5625) Traceback (most recent call last):
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=5625)     self.run()
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=5625)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=5625)     raise e
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=5625)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=5625)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=5625)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=5625)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=5625)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=5625)     self._init_executor()
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=5625)     self.collective_rpc("load_model")
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=5625)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=5625)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=5625)     return func(*args, **kwargs)
(EngineCore_DP0 pid=5625)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=5625)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=5625)     self.model = model_loader.load_model(
(EngineCore_DP0 pid=5625)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=5625)     self.load_weights(model, model_config)
(EngineCore_DP0 pid=5625)   File "/usr/local/lib/python3.11/site-packages/vllm/model_executor/model_loader/default_loader.py", line 276, in load_weights
(EngineCore_DP0 pid=5625)     raise ValueError("Following weights were not initialized from "
(EngineCore_DP0 pid=5625) ValueError: Following weights were not initialized from checkpoint: {'language_model.lm_head.weight'}
[rank0]:[W128 19:07:07.772602278 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
  File "/ProjectRoot/Qwen3-VL/embedding/infer_embed.py", line 164, in <module>
    main()
  File "/ProjectRoot/Qwen3-VL/embedding/infer_embed.py", line 136, in main
    llm = LLM(**vars(engine_args))
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vllm/entrypoints/llm.py", line 297, in __init__
    self.llm_engine = LLMEngine.from_engine_args(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 177, in from_engine_args
    return cls(vllm_config=vllm_config,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/llm_engine.py", line 114, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 80, in make_client
    return SyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 602, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
    with launch_core_engines(vllm_config, executor_class,
  File "/usr/local/lib/python3.11/contextlib.py", line 144, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
    wait_for_engine_startup(
  File "/usr/local/lib/python3.11/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VL-Embedding-2B可用，8B加载失败 #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

VL-Embedding-2B可用，8B加载失败 #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions