-
Notifications
You must be signed in to change notification settings - Fork 559
feat(benchmark): Add Procfile to run Guardrails and mock LLMs #1490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds infrastructure for running Guardrails with mock LLM servers for benchmarking and testing. The main changes reorganize configuration files and add orchestration tooling.
Key Changes:
- Added
Procfileto orchestrate three services: Guardrails server (port 9000), app LLM mock (port 8000), and content safety LLM mock (port 8001) - Created
validate_mocks.pyscript to verify all services are healthy and correctly configured - Fixed module path in
run_server.pyfrom relative (api:app) to absolute (nemoguardrails.benchmark.mock_llm_server.api:app) to enable running from any directory - Reorganized configs from
mock_llm_server/configs/tobenchmark/configs/with clearer separation between guardrail configs and mock configs
Architecture:
The setup creates a complete local testing environment where the Guardrails server uses two mock LLMs: one for the main application (4s latency) and one for content safety checks (0.5s latency). This allows benchmarking without external API dependencies.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk - it's well-structured infrastructure code for testing
- The changes are purely additive (new files and file moves) with one critical bug fix to make the mock server runnable from any directory. The code is well-documented, includes proper error handling in the validation script, and follows Python best practices. No production code is modified.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/benchmark/Procfile | 5/5 | Added Procfile to orchestrate three services: Guardrails server on port 9000, app LLM mock on 8000, and content safety LLM mock on 8001 |
| nemoguardrails/benchmark/validate_mocks.py | 5/5 | New validation script that checks health and model endpoints for all three services |
| nemoguardrails/benchmark/mock_llm_server/run_server.py | 5/5 | Updated uvicorn module path from relative to absolute: api:app to nemoguardrails.benchmark.mock_llm_server.api:app |
Sequence Diagram
sequenceDiagram
participant User
participant Procfile
participant GR as Guardrails Server<br/>(port 9000)
participant AppLLM as App LLM Mock<br/>(port 8000)
participant CSLLM as Content Safety Mock<br/>(port 8001)
participant Val as validate_mocks.py
Note over Procfile: Orchestrates 3 services
Procfile->>GR: Start nemoguardrails server<br/>--config configs/guardrail_configs
Procfile->>AppLLM: Start mock_llm_server<br/>meta-llama-3.3-70b-instruct
Procfile->>CSLLM: Start mock_llm_server<br/>nemoguard-8b-content-safety
Note over Val: Validation Script
Val->>AppLLM: GET /health
AppLLM-->>Val: {"status": "healthy"}
Val->>AppLLM: GET /v1/models
AppLLM-->>Val: [meta/llama-3.3-70b-instruct]
Val->>CSLLM: GET /health
CSLLM-->>Val: {"status": "healthy"}
Val->>CSLLM: GET /v1/models
CSLLM-->>Val: [nvidia/llama-3.1-nemoguard-8b-content-safety]
Val->>GR: GET /v1/rails/configs
GR-->>Val: [content_safety_colang1]
Note over User,CSLLM: Runtime Flow
User->>GR: POST /v1/chat/completions
GR->>CSLLM: Check input safety
CSLLM-->>GR: Safety result (0.5s)
GR->>AppLLM: Generate response
AppLLM-->>GR: LLM response (4s)
GR->>CSLLM: Check output safety
CSLLM-->>GR: Safety result (0.5s)
GR-->>User: Guarded response
7 files reviewed, no comments
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Added infrastructure to simplify the benchmark setup by introducing a Procfile for running Guardrails and mock LLM servers with a single command using honcho. This replaces the previous 4-tmux-pane manual setup.
Key changes:
- Added
Procfileto orchestrate Guardrails server and two mock LLM servers (app_llm on port 8000, cs_llm on port 8001) - Added
validate_mocks.pyscript to verify mock LLM health endpoints and Guardrails config endpoint - Restructured configs into
guardrail_configsandmock_configsdirectories - All servers now start with
poetry run honcho startfrom the benchmark directory
The changes follow established patterns from the related PR #1403 and provide better developer experience for benchmarking workflows.
Confidence Score: 4/5
- This PR is safe to merge with low risk
- The changes add new benchmark tooling without modifying core Guardrails functionality. The validation script is well-structured with proper error handling, timeout protection, and clear logging. All tests pass and pre-commit checks succeed. The only minor consideration is the hardcoded ports and model names in validate_mocks.py, which reduces flexibility but is acceptable for a development/testing tool.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/benchmark/validate_mocks.py | 4/5 | New validation script to health check mock LLM servers and Guardrails config endpoint |
Sequence Diagram
sequenceDiagram
participant User
participant Honcho
participant GR as Guardrails Server<br/>(port 9000)
participant APP as App LLM Mock<br/>(port 8000)
participant CS as Content Safety Mock<br/>(port 8001)
participant Validator as validate_mocks.py
User->>Honcho: poetry run honcho start
Honcho->>GR: Start Guardrails server
Honcho->>APP: Start App LLM mock
Honcho->>CS: Start Content Safety mock
Note over GR,CS: All services start concurrently
GR-->>Honcho: Server ready (9000)
APP-->>Honcho: Server ready (8000)
CS-->>Honcho: Server ready (8001)
User->>Validator: poetry run python validate_mocks.py
Validator->>APP: GET /health
APP-->>Validator: {"status": "healthy"}
Validator->>APP: GET /v1/models
APP-->>Validator: model list
Validator->>CS: GET /health
CS-->>Validator: {"status": "healthy"}
Validator->>CS: GET /v1/models
CS-->>Validator: model list
Validator->>GR: GET /v1/rails/configs
GR-->>Validator: configs array
Validator-->>User: All endpoints healthy!
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR simplifies the benchmark setup process by adding infrastructure to run Guardrails and mock LLM servers together. The key changes include:
- Procfile: Added to orchestrate 3 services (Guardrails server on port 9000, App LLM mock on port 8000, Content Safety LLM mock on port 8001) using the honcho package
- Config restructuring: Moved mock and Guardrails configs from
mock_llm_server/configs/tobenchmark/configs/for better organization - Validation script: Added
validate_mocks.pyto verify health and model endpoints of all services - Import fix: Updated
run_server.pyto use fully qualified module path (nemoguardrails.benchmark.mock_llm_server.api:app) instead of relative import - Logging improvement: Converted f-string logging statements to
%sformat (Python logging best practice to avoid unnecessary string interpolation)
The PR successfully reduces the complexity of starting the benchmark environment from 4 separate tmux panes to a single honcho start command, improving developer experience.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk
- Score reflects simple infrastructure changes that improve developer workflow. The validation script follows best practices (using %s-style logging, proper error handling, clear exit codes). All tests pass, pre-commit checks pass, and the test plan demonstrates working functionality. Only minor note is that
requestslibrary needs to be installed separately as documented. - No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/benchmark/validate_mocks.py | 5/5 | Converted f-string logging statements to %s format (best practice for logging). Script validates health of Mock LLM servers and Guardrails endpoints. |
Sequence Diagram
sequenceDiagram
participant User
participant ValidateMocks as validate_mocks.py
participant AppLLM as Mock LLM (8000)
participant CSLLM as Content Safety LLM (8001)
participant Guardrails as Guardrails (9000)
User->>ValidateMocks: Run validation script
ValidateMocks->>AppLLM: GET /health
AppLLM-->>ValidateMocks: {"status": "healthy"}
ValidateMocks->>AppLLM: GET /v1/models
AppLLM-->>ValidateMocks: {"data": [{"id": "meta/llama-3.3-70b-instruct"}]}
ValidateMocks->>CSLLM: GET /health
CSLLM-->>ValidateMocks: {"status": "healthy"}
ValidateMocks->>CSLLM: GET /v1/models
CSLLM-->>ValidateMocks: {"data": [{"id": "nvidia/llama-3.1-nemoguard-8b-content-safety"}]}
ValidateMocks->>Guardrails: GET /v1/rails/configs
Guardrails-->>ValidateMocks: [config_array]
ValidateMocks->>User: Report all checks passed (exit 0)
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Adds a comprehensive test suite for the validate_mocks.py validation script. The tests thoroughly cover health checks, model validation, and Rails endpoint verification with proper mocking.
Key changes:
- Comprehensive test coverage for
check_endpoint()function including success, failure, timeout, and JSON parsing error cases - Full test coverage for
check_rails_endpoint()function with various response scenarios - Tests for the
main()function verifying proper exit codes and error handling - Script execution tests to verify the module can be imported and run directly
Issue found:
- Line 464 contains a hardcoded absolute path that will fail in CI/CD and other environments
Confidence Score: 4/5
- Safe to merge after fixing the hardcoded path issue on line 464
- The test suite is comprehensive and well-structured with excellent coverage of edge cases. However, the hardcoded absolute path on line 464 will cause test failures in CI/CD environments and on different developer machines, preventing the test from running successfully
- tests/benchmark/test_validate_mocks.py:464 requires fixing the hardcoded path
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| tests/benchmark/test_validate_mocks.py | 4/5 | Comprehensive test suite for validate_mocks.py with one critical hardcoded path issue (line 464) |
Sequence Diagram
sequenceDiagram
participant Test as Test Suite
participant Mock as Mock requests.get
participant CheckEP as check_endpoint()
participant CheckRails as check_rails_endpoint()
participant Main as main()
Test->>Mock: Setup mock responses
Test->>CheckEP: Call with port & model
CheckEP->>Mock: GET /health
Mock-->>CheckEP: {"status": "healthy"}
CheckEP->>Mock: GET /v1/models
Mock-->>CheckEP: {"data": [{"id": "model"}]}
CheckEP-->>Test: (success, summary)
Test->>CheckRails: Call with port
CheckRails->>Mock: GET /v1/rails/configs
Mock-->>CheckRails: [config1, config2]
CheckRails-->>Test: (success, summary)
Test->>Main: Call main()
Main->>CheckEP: Check port 8000
Main->>CheckEP: Check port 8001
Main->>CheckRails: Check port 9000
Main-->>Test: sys.exit(0 or 1)
1 file reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Added comprehensive unit tests for validate_mocks.py script that validates mock LLM endpoints and Guardrails configuration. The tests cover:
check_endpointfunction: 14 test cases covering health checks, model validation, connection errors, timeouts, and JSON parsing errorscheck_rails_endpointfunction: 7 test cases covering Rails config validation, response structure checks, and error handlingmainfunction: 4 test cases covering success and failure scenarios with proper exit code validation
All tests use proper mocking with unittest.mock to avoid external dependencies. Tests follow pytest conventions and include clear docstrings. The test file matches the structure and error handling paths in the source code.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk
- The changes add comprehensive unit tests for the validate_mocks.py script with excellent coverage of edge cases, error handling, and success paths. All tests use proper mocking to avoid external dependencies, follow pytest conventions, and the PR description shows tests pass successfully (2054 passed, 125 skipped). The test code is well-structured, properly documented, and thoroughly covers all functions in the source file.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| tests/benchmark/test_validate_mocks.py | 5/5 | New comprehensive unit test file covering all functions in validate_mocks.py with extensive edge case handling |
Sequence Diagram
sequenceDiagram
participant User
participant Honcho
participant GR as Guardrails Server<br/>(port 9000)
participant AppLLM as App LLM Mock<br/>(port 8000)
participant CSLLM as Content Safety LLM Mock<br/>(port 8001)
participant Script as validate_mocks.py
User->>Honcho: honcho start (reads Procfile)
Honcho->>GR: Start Guardrails server
Honcho->>AppLLM: Start App LLM mock
Honcho->>CSLLM: Start Content Safety LLM mock
Note over GR,CSLLM: All services start in parallel
User->>Script: python validate_mocks.py
Script->>AppLLM: GET /health
AppLLM-->>Script: 200 OK {"status": "healthy"}
Script->>AppLLM: GET /v1/models
AppLLM-->>Script: 200 OK {"data": [{"id": "meta/llama-3.3-70b-instruct"}]}
Script->>CSLLM: GET /health
CSLLM-->>Script: 200 OK {"status": "healthy"}
Script->>CSLLM: GET /v1/models
CSLLM-->>Script: 200 OK {"data": [{"id": "nvidia/llama-3.1-nemoguard-8b-content-safety"}]}
Script->>GR: GET /v1/rails/configs
GR-->>Script: 200 OK [{"id": "content_safety_colang1"}]
Script-->>User: All checks passed (exit 0)
1 file reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds a new benchmark extras category to pyproject.toml to install honcho and requests packages. These dependencies support the benchmarking infrastructure introduced in PR #1403, allowing users to run Guardrails and mock LLMs using a Procfile with a single command.
Key Changes:
- Added
requests >= 2.31.0andhoncho >= 1.1.0as optional dependencies - Created new
benchmarkextras group for easy installation viapip install nemoguardrails[benchmark] - Updated
allextras group to include benchmark dependencies - Updated
poetry.lockwith dependency resolution (includes version bumps foraiofilesandaiohttp)
The changes are minimal, well-structured, and follow existing patterns in the project.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk
- The changes only add optional dependencies to pyproject.toml and update the lock file. No production code is modified. The dependencies are well-established packages (requests and honcho) with appropriate version constraints. Tests exist for the code that uses these dependencies.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pyproject.toml | 5/5 | Adds benchmark extras group with requests and honcho dependencies for benchmark tooling. Changes are well-structured and follow existing patterns. |
| poetry.lock | 5/5 | Lock file updated to include honcho and requests dependencies with their transitive dependencies. Standard poetry lock file update. |
Sequence Diagram
sequenceDiagram
participant User
participant pip/poetry
participant pyproject.toml
participant poetry.lock
participant honcho
participant validate_mocks.py
User->>pip/poetry: pip install nemoguardrails[benchmark]
pip/poetry->>pyproject.toml: Read benchmark extras
pyproject.toml-->>pip/poetry: requests>=2.31.0, honcho>=1.1.0
pip/poetry->>poetry.lock: Resolve dependencies
poetry.lock-->>pip/poetry: Resolved versions
pip/poetry->>User: Installation complete
User->>honcho: honcho start (via Procfile)
honcho->>honcho: Start Guardrails server
honcho->>honcho: Start Mock LLM servers
User->>validate_mocks.py: python validate_mocks.py
validate_mocks.py->>validate_mocks.py: Uses requests library
validate_mocks.py->>User: Health check results
1 file reviewed, no comments
Pouyanpi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @tgasser-nv . Do we expect the benchmark tools to be used when users install it from PyPI?
Currently it will be packaged and distributed when users run pip install nemoguardrails[benchmark], but there are some usability issues that will prevent users from running them without significant WARs.
For example as Procfile uses relative path the user experience looks like (a workarount):
pip install nemoguardrails[benchmark]
# step 1: find where configs are located?
BENCHMARK_DIR=$(python -c "import os, nemoguardrails.benchmark; \
print(os.path.dirname(nemoguardrails.benchmark.__file__))")
CONFIG_FILE="$BENCHMARK_DIR/configs/mock_configs/meta-llama-3.3-70b-instruct.env"
echo "Using config: $CONFIG_FILE"
# step 2: run the mock server
python -m nemoguardrails.benchmark.mock_llm_server.run_server \
--port 8000 \
--config-file "$CONFIG_FILE"this happens as in Procfile
gr: poetry run nemoguardrails server --config configs/guardrail_configs ...
app_llm: poetry run python mock_llm_server/run_server.py --port 8000 ...
cs_llm: poetry run python mock_llm_server/run_server.py --port 8001 ...paths like configs/guardrail_configs and mock_llm_server/run_server.py are relative and after pip install, these paths don't exist relative to user's working directory and Procfile expects to run from nemoguardrails/benchmark/ directory
So Procfile cannot be used as-is.
Do we intent to package this? Also I think a minimal README.md sounds necessary.
Another example is the hard-coded ports which users will have hard time configuring them. Please see the comments below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove poetry.lock and do poetry lock --no-update
(better to do a soft reset, remove, force push or interactive rebase and force push)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted the single commit with changes to pyproject.toml and poetry.lock to undo these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see hardcoded ports in multiple places
The ports (8000, 8001, 9000) are hardcoded in:
- Procfile
- validate_mocks.py
- config.yml
If someone changes one location, they must remember to change all three. consider:
- environment variables
- a shared config file (config.py that reads from env vars with defaults)
- or at minimum, we should document this dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a limitation at the moment. This is intended for local offline testing, to quantify performance regressions in around the same time as unit-tests take today. I could have a central config which then propagates out to the Guardrails configs, Mock LLM configs, and Procfile to tie all three together and keep them consistent. But I'd prefer to keep it simple and add more documentation to help guide people configure everything. Will get a readme written up for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Procfile has relative paths that assume we're running from the nemoguardrails/benchmark directory:
If we run honcho from the project root, these paths won't work. we need either:
- absotlute paths from project root
- clear documentation that honcho start must run from nemoguardrails/benchmark/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional, the Procfile is only intended for use in the benchmarking directory. If it was at the project root, I'd expect it to spin up a production Guardrails set of services rather than a set of mocked LLMs and Guardrails. I'll add a proper README to explain all this
…, and APP LLM mock
a04215f to
d379bc5
Compare
Documentation preview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR simplifies the benchmark setup by adding a Procfile and honcho-based orchestration, reducing the startup from 4 tmux panes to a single command.
Key Changes:
- Added
Procfileto orchestrate Guardrails server and two mock LLM servers via honcho - Added
validate_mocks.pyscript using httpx (already in dependencies) to verify health and endpoints - Added comprehensive unit tests for the validation script
- Reorganized configuration files from
mock_llm_server/configstobenchmark/configsfor better structure - Enhanced
run_server.pywith--workersCLI argument and fixed module path for poetry run compatibility - Added detailed README documentation with quickstart guide and configuration explanations
Integration:
The Procfile coordinates three services: Guardrails server (port 9000), Application LLM mock (port 8000 with 4 workers), and Content Safety LLM mock (port 8001 with 4 workers). The validation script uses httpx to check /health and /v1/models endpoints on the mocks, and /v1/rails/configs on Guardrails.
Confidence Score: 5/5
- This PR is safe to merge with minimal risk - it adds helpful tooling without modifying core functionality
- Score reflects excellent code quality with comprehensive testing, clear documentation, proper use of existing dependencies (httpx), well-organized file structure, and no changes to production code paths - only adds developer tooling for benchmarking
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/benchmark/Procfile | 5/5 | Added Procfile to orchestrate Guardrails server and two mock LLM servers with honcho, enabling single-command startup of the benchmark environment |
| nemoguardrails/benchmark/validate_mocks.py | 5/5 | Added validation script using httpx (already in dependencies) to check health and model endpoints for the mock servers and Guardrails config endpoint |
| tests/benchmark/test_validate_mocks.py | 5/5 | Comprehensive unit tests for validate_mocks.py with excellent coverage of success and failure scenarios |
| nemoguardrails/benchmark/README.md | 5/5 | Added comprehensive documentation explaining the benchmark setup, Procfile usage, and configuration details with clear examples |
| nemoguardrails/benchmark/mock_llm_server/run_server.py | 5/5 | Added --workers CLI argument and fixed module path from 'api:app' to full path for compatibility with poetry run from different directories |
Sequence Diagram
sequenceDiagram
participant User
participant Honcho
participant GR as Guardrails Server<br/>(port 9000)
participant AppLLM as App LLM Mock<br/>(port 8000)
participant CSLLM as Content Safety LLM Mock<br/>(port 8001)
participant Validator as validate_mocks.py
Note over User,Honcho: Startup Phase
User->>Honcho: poetry run honcho start
Honcho->>GR: Start Guardrails server
Honcho->>AppLLM: Start app_llm mock (4 workers)
Honcho->>CSLLM: Start cs_llm mock (4 workers)
AppLLM-->>Honcho: Ready on port 8000
CSLLM-->>Honcho: Ready on port 8001
GR-->>Honcho: Ready on port 9000
Note over User,Validator: Validation Phase
User->>Validator: poetry run python validate_mocks.py
Validator->>AppLLM: GET /health
AppLLM-->>Validator: {"status": "healthy"}
Validator->>AppLLM: GET /v1/models
AppLLM-->>Validator: meta/llama-3.3-70b-instruct
Validator->>CSLLM: GET /health
CSLLM-->>Validator: {"status": "healthy"}
Validator->>CSLLM: GET /v1/models
CSLLM-->>Validator: nvidia/llama-3.1-nemoguard-8b-content-safety
Validator->>GR: GET /v1/rails/configs
GR-->>Validator: [config array]
Validator-->>User: All checks PASSED
Note over User,GR: Request Flow
User->>GR: POST /v1/chat/completions
GR->>CSLLM: Input rail: content safety check
CSLLM-->>GR: {"User Safety": "safe"}
GR->>AppLLM: Generate response
AppLLM-->>GR: Response text
GR->>CSLLM: Output rail: content safety check
CSLLM-->>GR: {"Response Safety": "safe"}
GR-->>User: Final response
9 files reviewed, no comments
Description
This PR simplifies the spinup of the Guardrails+mock benchmark. Previously 4 tmux panes were needed (Content safety LLM mock, App LLM mock, Guardrails server, curl client calls). See the test plan in #1403 for more details.
After this PR, Guardrails and associated mock LLMs are spun up with a single command using a Procfile and the honcho package. I also added a simple Python
validate_mocks.pyfile to ping the health and check model names on the Mock LLMs, and make sure we can read back a Guardrails config from Guardrails.Follow-on work in future PRs
Test Plan
Server
cd nemoguardrails/benchmark.poetry shell.pip install honcho.poetry run honcho start, wait for each service to come up:Client
/chat/completions test
Unit-tests
Pre-commit checks
Related Issue(s)
Builds on #1403 to simplify spinning up the benchmark.
Checklist