use guidellm to test vllm benchmark (with litellm proxy), failed

try to use guidellm to test vllm benchmark (with litellm proxy), failed

1  setup vllm service
  vllm serve /root/data/Qwen/Qwen2-0.5B-Instruct --port 8001 --served-model-name Qwen2-0.5B-Instruct  

2 setup litellm proxy

 litellm_config.yaml
```
model_list:
  - model_name: "vllm-model-group-1"
    litellm_params:
      model: "hosted_vllm/Qwen2-0.5B-Instruct"
      api_base: "http://vllmserverIP:8001/v1"
      request_timeout: 120
```
litellm --config config.yaml --port 4000

3 use curl to access vllm service and litellm proxy directly , both works fine. 

4 run guidellm , failed

guidellm benchmark   --target "http://localhost:4000"   --rate-type sweep   --max-seconds 30   --data "prompt_tokens=5,output_tokens=2"


```
guidellm benchmark   --target "http://localhost:4000"   --rate-type sweep   --data "prompt_tokens=5,output_tokens=2"
Creating backend...
25-09-09 01:09:14|ERROR            |guidellm.backend.openai:text_completions:306 - OpenAIHTTPBackend request with headers: {'Content-Type': 'application/json'} and params: {} and payload: {'prompt': 'Test connection', 'model': 'vllm-model-group-1', 'stream': True, 'stream_options': {'include_usage': True}, 'max_tokens': 1, 'max_completion_tokens': 1, 'stop': None, 'ignore_eos': True} failed: Server error '500 Internal Server Error' for url 'http://localhost:4000/v1/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500
Traceback (most recent call last):
  File "/usr/local/bin/guidellm", line 7, in <module>
    sys.exit(cli())
             ^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/guidellm/__main__.py", line 314, in run
    asyncio.run(
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/guidellm/benchmark/entrypoints.py", line 29, in benchmark_with_scenario
    return await benchmark_generative_text(**vars(scenario), **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/guidellm/benchmark/entrypoints.py", line 71, in benchmark_generative_text
    await backend.validate()
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/backend.py", line 143, in validate
    async for _ in self.text_completions(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/openai.py", line 314, in text_completions
    raise ex
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/openai.py", line 295, in text_completions
    async for resp in self._iterative_completions_request(
  File "/usr/local/lib/python3.12/dist-packages/guidellm/backend/openai.py", line 605, in _iterative_completions_request
    stream.raise_for_status()
  File "/usr/local/lib/python3.12/dist-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Server error '500 Internal Server Error' for url 'http://localhost:4000/v1/completions'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/500


  
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use guidellm to test vllm benchmark (with litellm proxy), failed #308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

use guidellm to test vllm benchmark (with litellm proxy), failed #308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions