This is an HTTP load testing tool built using Python's asyncio and aiohttp libraries. It allows you to benchmark the performance of a given URL by specifying the queries per second (QPS), number of workers, duration, and timeout.
- The fixed
--qpslogic was incorrect. It was working fine for local APIs; however, when used with FireworksAI API or any other public URL like this one, the total numbers of requested would not equalqps * duration.
- Core Issue: The worker had a logic to timeout with
await asyncio.sleep(duration)followed bytask.cancel()which made the worker quit even if there were outgoing requests still waiting to be completed. - Fix: Use
await asyncio.gather()function to wait for all the requests to complete and then do the final processing for results. There is also now astop_flagadded as anasyncio.Event()which will help terminate the worker process gracefully.
- Previous logic indicated that we would need to create an extra worker in case
qps % num_workers > 0; however, this has been changed to the default implementation where we assign the last worker with the extraqps % num_workers.
- Asynchronous HTTP requests using
aiohttp. - Customizable duration, number of workers, timeout and QPS.
- Statistical reporting of API Response Time, including percentiles (50th, 90th, 97th, 99th), mean, and standard deviation.
- Error tracking for non-200 HTTP responses.
- If errors are encountered, a set of small descriptions of obtained exception as
errors_status. - Returns metrics on API Response Time: means the time taken to return the complete response to the request.
- Returns metrics on Latency: means the time taken to transfer data between the client and the server.
- All the above for
response_time,latencyandTime-to-First-Token - If
stream=True, then checkTime-to-First-Tokenin the text completion API. (Made possible sinceBOS_TOKENreturned as first response) - Configure specific details like:
modelpromptmax_tokens
- Python 3.8+
- Docker
-
Installation with Docker
docker build -t http-project .- The above will install
pipdependencies and run the API on port80within the container. - To start the container on the image on port
8000on your device:docker run -p 8000:80 http-project
- The above will install
-
Local Installation
- Install the required libraries using pip:
python -m pip install -r requirements.txt
- Simply use one of
benchmark.pyorfireworks_ai_benchmark.pyfiles:python <file_name.py> <args>
- Install the required libraries using pip:
Simply run:
python -m unittest test_benchmark.py - To test on Basic HTTP API, use the following
cURLcommand:curl --location 'http://localhost:8000/benchmark' \ --header 'Content-Type: application/json' \ --data '{ "url": "https://httpbin.org/get", "qps": 20, "duration": 5, "num_workers": 20, "timeout": 0.25 }'
- Use the
dummy_api.pyto test the below by running:uvicorn dummy_api:app --port 8081 --reload- Sample Response:
{ "config": { "url": "https://httpbin.org/get", "qps": 20, "duration": 5, "num_workers": 20, "timeout": 0.25 }, "total_requests": 100, "errors": 10, "mean_response_time": 0.09689875602722169, "std_response_time": 0.08430709021882068, "response_time_p50": 0.03715097904205322, "response_time_p90": 0.2193705320358277, "response_time_p97": 0.2523853826522827, "response_time_p99": 0.25324128866195683, "mean_latency": 0.08327518597893092, "std_latency": 0.07355851797061785, "latency_p50": 0.036652207374572754, "latency_p90": 0.21138486862182618, "latency_p97": 0.21849927425384522, "latency_p99": 0.22762411594390872, "errors_status": [ "", "Request not completed" ] }- To test on Fireworks AI API, use the following
cURLcommand:- [NOTE] Field
tokenneeds to be passed to the request which contains API key from FireworksAI
curl --location 'http://localhost:8000/fireworks_benchmark' \ --header 'Content-Type: application/json' \ --data '{ "qps": 10, "duration":5, "token": "KbtMA1I6TV35O6xjb9zOXRB8vDjI8iNUPFRKq6lESDuOTWJN", "model": "accounts/fireworks/models/llama-v3-8b-instruct-hf", "max_tokens": 25, "prompt": "Snow is white", "url": "https://api.fireworks.ai/inference/v1/completions", "stream": "True", "num_workers": 10, "timeout": 1.5 }'
- Sample Response:
{ "config": { "fireworks_payload": { "model": "accounts/fireworks/models/llama-v3-8b-instruct-hf", "prompt": "Snow is white", "max_tokens": 25, "logprobs": 2, "echo": true, "temperature": 1, "top_p": 1, "top_k": 50, "frequency_penalty": 0, "presence_penalty": 0, "n": 1, "stop": "<string>", "stream": true, "context_length_exceeded_behavior": "truncate", "user": "<string>" }, "qps": 10, "duration": 5, "url": "https://api.fireworks.ai/inference/v1/completions", "num_workers": 10, "timeout": 1.5 }, "total_requests": 50, "errors": 17, "mean_response_time": 1.256036820411682, "std_response_time": 0.24853412813865153, "response_time_p50": 1.35898756980896, "response_time_p90": 1.5038734197616577, "response_time_p97": 1.5060858416557312, "response_time_p99": 1.506537208557129, "mean_latency": 0.46938368853400736, "std_latency": 0.2533548911879965, "latency_p50": 0.4923844337463379, "latency_p90": 0.8658977508544922, "latency_p97": 0.8710399532318115, "latency_p99": 0.8713918018341065, "mean_time_to_first_token": 0.4694188061882468, "std_time_to_first_token": 0.25335066457698385, "time_to_first_token_p50": 0.49241387844085693, "time_to_first_token_p90": 0.8659332036972046, "time_to_first_token_p97": 0.8710464000701904, "time_to_first_token_p99": 0.8713988780975341, "errors_status": [ "", "Request not completed" ] }
- [NOTE] Field
- To test on Fireworks AI API, use the following
- Sample Response:
- Use the