Skip to content

use guidellm to test vllm benchmark (with litellm proxy), failed #308

@liyuerich

Description

@liyuerich

try to use guidellm to test vllm benchmark (with litellm proxy), failed

1 setup vllm service
vllm serve /root/data/Qwen/Qwen2-0.5B-Instruct --port 8001 --served-model-name Qwen2-0.5B-Instruct

2 setup litellm proxy

litellm_config.yaml

model_list:
  - model_name: "vllm-model-group-1"
    litellm_params:
      model: "hosted_vllm/Qwen2-0.5B-Instruct"
      api_base: "http://vllmserverIP:8001/v1"
      request_timeout: 120

litellm --config config.yaml --port 4000

3 use curl to access vllm service and litellm proxy directly , both works fine.

4 run guidellm , failed

guidellm benchmark --target "http://localhost:4000" --rate-type sweep --max-seconds 30 --data "prompt_tokens=5,output_tokens=2"

guidellm benchmark   --target "http://localhost:4000"   --rate-type sweep   --data "prompt_tokens=5,output_tokens=2"
Creating backend...
25-09-09 01:09:14|ERROR            |guidellm.backend.openai:text_completions:306 - OpenAIHTTPBackend request with headers: {'Content-Type': 'application/json'} and params: {} and payload: {'prompt': 'Test connection', 'model': 'vllm-model-group-1', 'stream': True, 'stream_options': {'include_usage': True}, 'max_tokens': 1, 'max_completion_tokens': 1, 'stop': None, 'ignore_eos': True} failed: Server error '500 Internal Server Error' for url 'http://localhost:4000/v1/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
Traceback (most recent call last):
  File "/usr/local/bin/guidellm", line 7, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/guidellm/__main__.py", line 314, in run
    asyncio.run(
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/guidellm/benchmark/entrypoints.py", line 29, in benchmark_with_scenario
    return await benchmark_generative_text(**vars(scenario), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/guidellm/benchmark/entrypoints.py", line 71, in benchmark_generative_text
    await backend.validate()
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/backend.py", line 143, in validate
    async for _ in self.text_completions(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/openai.py", line 314, in text_completions
    raise ex
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/openai.py", line 295, in text_completions
    async for resp in self._iterative_completions_request(
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/openai.py", line 605, in _iterative_completions_request
    stream.raise_for_status()
  File "/usr/local/lib/python3.12/dist-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://localhost:4000/v1/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500


  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions