-
Notifications
You must be signed in to change notification settings - Fork 218
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Not sure if this was a bug or just not running correctly
Steps to reproduce:
- run
examples/quantization_w8a8_fp8/fp8_block_example.py
, without changing model - Host model using vllm openai container v0.10.0
INFO 08-01 08:06:00 [__init__.py:235] Automatically detected platform cuda.
INFO 08-01 08:06:02 [api_server.py:1755] vLLM API server version 0.10.1.dev1+gbcc0a3cbe
INFO 08-01 08:06:02 [cli_args.py:261] non-default args: {'model': '/opt/model'}
INFO 08-01 08:06:07 [config.py:1604] Using max model len 131072
INFO 08-01 08:06:08 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=8192.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1856, in <module>
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client_from_engine_args
vllm_config = engine_args.create_engine_config(usage_context=usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm/engine/arg_utils.py", line 1277, in create_engine_config
config = VllmConfig(
^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 123, in __init__
s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for VllmConfig
block_structure
Input should be a valid string [type=string_type, input_value=[128, 128], input_type=list]
For further information visit https://errors.pydantic.dev/2.11/v/string_type
Also collect_env
### Environment Information ###
Operating System: `Linux-5.15.0-136-generic-x86_64-with-glibc2.35`
Python Version: `3.12.9 (main, Mar 17 2025, 21:01:58) [Clang 20.1.0 ]`
llm-compressor Version: `0.6.1.dev51+g0a20392c6.d20250801`
compressed-tensors Version: `0.10.2`
transformers Version: `4.54.1`
torch Version: `2.7.1`
CUDA Devices: `['NVIDIA H200']`
AMD Devices: `None`
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested