Skip to content

Conversation

@tgasser-nv
Copy link
Collaborator

@tgasser-nv tgasser-nv commented Nov 3, 2025

Description

This PR simplifies the spinup of the Guardrails+mock benchmark. Previously 4 tmux panes were needed (Content safety LLM mock, App LLM mock, Guardrails server, curl client calls). See the test plan in #1403 for more details.

After this PR, Guardrails and associated mock LLMs are spun up with a single command using a Procfile and the honcho package. I also added a simple Python validate_mocks.py file to ping the health and check model names on the Mock LLMs, and make sure we can read back a Guardrails config from Guardrails.

Follow-on work in future PRs

  • Add NVIDIA aiperf to generate benchmarking load.
  • Benchmark Mock LLMs to find upper-bound of their performance (concurrency).

Test Plan

Server

  1. Change into benchmark directory: cd nemoguardrails/benchmark.
  2. Activate the poetry virtual environment: poetry shell.
  3. Install honcho: pip install honcho.
  4. Drop out of poetry shell (Ctrl+D).
  5. Run honcho: poetry run honcho start, wait for each service to come up:
poetry run honcho start
11:06:34 system    | gr.1 started (pid=12799)
11:06:34 system    | cs_llm.1 started (pid=12800)
11:06:34 system    | app_llm.1 started (pid=12801)
11:06:40 app_llm.1 | 2025-11-03 11:06:40 INFO: Using config file: configs/mock_configs/meta-llama-3.3-70b-instruct.env
11:06:40 cs_llm.1  | 2025-11-03 11:06:40 INFO: Using config file: configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env
11:06:40 cs_llm.1  | 2025-11-03 11:06:40 INFO: Starting Mock LLM Server on 0.0.0.0:8001
11:06:40 app_llm.1 | 2025-11-03 11:06:40 INFO: Starting Mock LLM Server on 0.0.0.0:8000
11:06:40 cs_llm.1  | 2025-11-03 11:06:40 INFO: OpenAPI docs available at: http://0.0.0.0:8001/docs
11:06:40 app_llm.1 | 2025-11-03 11:06:40 INFO: OpenAPI docs available at: http://0.0.0.0:8000/docs
11:06:40 cs_llm.1  | 2025-11-03 11:06:40 INFO: Health check at: http://0.0.0.0:8001/health
11:06:40 app_llm.1 | 2025-11-03 11:06:40 INFO: Health check at: http://0.0.0.0:8000/health
11:06:40 cs_llm.1  | 2025-11-03 11:06:40 INFO: Serving model with config configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env
11:06:40 app_llm.1 | 2025-11-03 11:06:40 INFO: Serving model with config configs/mock_configs/meta-llama-3.3-70b-instruct.env
11:06:40 cs_llm.1  | 2025-11-03 11:06:40 INFO: Press Ctrl+C to stop the server
11:06:40 app_llm.1 | 2025-11-03 11:06:40 INFO: Press Ctrl+C to stop the server
11:06:40 app_llm.1 | INFO:     Loading environment from 'configs/mock_configs/meta-llama-3.3-70b-instruct.env'
11:06:40 cs_llm.1  | INFO:     Loading environment from 'configs/mock_configs/nvidia-llama-3.1-nemoguard-8b-content-safety.env'
11:06:41 cs_llm.1  | INFO:     Started server process [12800]
11:06:41 app_llm.1 | INFO:     Started server process [12801]
11:06:41 cs_llm.1  | INFO:     Waiting for application startup.
11:06:41 app_llm.1 | INFO:     Waiting for application startup.
11:06:41 app_llm.1 | INFO:     Application startup complete.
11:06:41 cs_llm.1  | INFO:     Application startup complete.
11:06:41 cs_llm.1  | INFO:     Uvicorn running on http://0.0.0.0:8001 (Press CTRL+C to quit)
11:06:41 app_llm.1 | INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
11:06:43 gr.1      | INFO:     Started server process [12799]
11:06:43 gr.1      | INFO:     Waiting for application startup.
11:06:43 gr.1      | INFO:     Application startup complete.
11:06:43 gr.1      | INFO:     Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)

Client

  1. Run the validation script:
$  poetry run python nemoguardrails/benchmark/validate_mocks.py
Starting LLM endpoint health check...

--- Checking Port: 8000 ---
Checking http://localhost:8000/health ...
Health Check PASSED: Status is 'healthy'.
Checking http://localhost:8000/v1/models for 'meta/llama-3.3-70b-instruct'...
Model Check PASSED: Found 'meta/llama-3.3-70b-instruct' in model list.
--- Port 8000: ALL CHECKS PASSED ---

--- Checking Port: 8001 ---
Checking http://localhost:8001/health ...
Health Check PASSED: Status is 'healthy'.
Checking http://localhost:8001/v1/models for 'nvidia/llama-3.1-nemoguard-8b-content-safety'...
Model Check PASSED: Found 'nvidia/llama-3.1-nemoguard-8b-content-safety' in model list.
--- Port 8001: ALL CHECKS PASSED ---

--- Checking Port: 9000 (Rails Config) ---
Checking http://localhost:9000/v1/rails/configs ...
HTTP Status PASSED: Got 200.
Body Check PASSED: Response is an array with at least one entry.
--- Port 9000: ALL CHECKS PASSED ---

--- Final Summary ---
Port 8000 (meta/llama-3.3-70b-instruct): PASSED
Port 8001 (nvidia/llama-3.1-nemoguard-8b-content-safety): PASSED
Port 9000 (Rails Config): PASSED
---------------------
Overall Status: All endpoints are healthy!

  1. Check server logs:
11:08:23 app_llm.1 | 2025-11-03 11:08:23 INFO: Request finished: 200, took 0.000 seconds
11:08:23 app_llm.1 | INFO:     127.0.0.1:60612 - "GET /health HTTP/1.1" 200 OK
11:08:23 app_llm.1 | 2025-11-03 11:08:23 INFO: Request finished: 200, took 0.001 seconds
11:08:23 app_llm.1 | INFO:     127.0.0.1:60613 - "GET /v1/models HTTP/1.1" 200 OK
11:08:23 cs_llm.1  | 2025-11-03 11:08:23 INFO: Request finished: 200, took 0.000 seconds
11:08:23 cs_llm.1  | INFO:     127.0.0.1:60614 - "GET /health HTTP/1.1" 200 OK
11:08:23 cs_llm.1  | 2025-11-03 11:08:23 INFO: Request finished: 200, took 0.001 seconds
11:08:23 cs_llm.1  | INFO:     127.0.0.1:60615 - "GET /v1/models HTTP/1.1" 200 OK
11:08:23 gr.1      | INFO:     127.0.0.1:60616 - "GET /v1/rails/configs HTTP/1.1" 200 OK

/chat/completions test

$  curl -X POST http://0.0.0.0:9000/v1/chat/completions \
   -H 'Accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "model": "nvidia/llama-3.1-nemoguard-8b-content-safety",
      "messages": [
         {
            "role": "user",
            "content": "what can you do for me?"
         }
      ],
      "max_tokens": 16,
      "stream": false,
      "temperature": 1,
      "top_p": 1
   }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   619  100   334  100   285     41     35  0:00:08  0:00:08 --:--:--    81
{
  "messages": [
    {
      "role": "assistant",
      "content": "I can provide information and help with a wide range of topics, from science and history to entertainment and culture. I can also help with language-related tasks, such as translation and text summarization. However, I can't assist with requests that involve harm or illegal activities."
    }
  ]
}

Unit-tests

$  poetry run pytest tests -q
.................................................................................................................. [  5%]
.......................................................................................................s.......... [ 10%]
...........................................................................................................sssssss [ 15%]
........................s......ss................................................................................. [ 20%]
....................................................................................................ss.......s.... [ 26%]
..................................................................................ss........ss...ss............... [ 31%]
.................ss................s...................................................s............s............. [ 36%]
.................................................................................................................. [ 41%]
................................................................................................................ss [ 47%]
sss......sssssssssssssssss.........ssss........................................................................... [ 52%]
......s...........ss..................ssssssss.ssssssssss.....................................................s... [ 57%]
.s.....................................ssssssss..............sss...ss...ss.............................sssssssssss [ 62%]
ss............................................/Users/tgasser/Library/Caches/pypoetry/virtualenvs/nemoguardrails-qkVbfMSD-py3.13/lib/python3.13/site-packages/_pytest/stash.py:108: RuntimeWarning: coroutine 'AsyncMockMixin._execute_mock_call' was never awaited
  del self._storage[key]
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
................s................................................... [ 68%]
.................................................sssssssss.........ss............................................. [ 73%]
..............................................................sssssss............................................. [ 78%]
...................................................s.............................................................. [ 83%]
..........................................ss...................................................................... [ 88%]
.................................................................................................................. [ 94%]
.........................................................................................................s........ [ 99%]
.............                                                                                                      [100%]
2054 passed, 125 skipped in 133.17s (0:02:13)

Pre-commit checks

$ poetry run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
isort (python)...........................................................Passed
black....................................................................Passed
Insert license in comments...............................................Passed
pyright..................................................................Passed

Related Issue(s)

Builds on #1403 to simplify spinning up the benchmark.

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @mentions of the person or team responsible for reviewing proposed changes.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds infrastructure for running Guardrails with mock LLM servers for benchmarking and testing. The main changes reorganize configuration files and add orchestration tooling.

Key Changes:

  • Added Procfile to orchestrate three services: Guardrails server (port 9000), app LLM mock (port 8000), and content safety LLM mock (port 8001)
  • Created validate_mocks.py script to verify all services are healthy and correctly configured
  • Fixed module path in run_server.py from relative (api:app) to absolute (nemoguardrails.benchmark.mock_llm_server.api:app) to enable running from any directory
  • Reorganized configs from mock_llm_server/configs/ to benchmark/configs/ with clearer separation between guardrail configs and mock configs

Architecture:
The setup creates a complete local testing environment where the Guardrails server uses two mock LLMs: one for the main application (4s latency) and one for content safety checks (0.5s latency). This allows benchmarking without external API dependencies.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it's well-structured infrastructure code for testing
  • The changes are purely additive (new files and file moves) with one critical bug fix to make the mock server runnable from any directory. The code is well-documented, includes proper error handling in the validation script, and follows Python best practices. No production code is modified.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/benchmark/Procfile 5/5 Added Procfile to orchestrate three services: Guardrails server on port 9000, app LLM mock on 8000, and content safety LLM mock on 8001
nemoguardrails/benchmark/validate_mocks.py 5/5 New validation script that checks health and model endpoints for all three services
nemoguardrails/benchmark/mock_llm_server/run_server.py 5/5 Updated uvicorn module path from relative to absolute: api:app to nemoguardrails.benchmark.mock_llm_server.api:app

Sequence Diagram

sequenceDiagram
    participant User
    participant Procfile
    participant GR as Guardrails Server<br/>(port 9000)
    participant AppLLM as App LLM Mock<br/>(port 8000)
    participant CSLLM as Content Safety Mock<br/>(port 8001)
    participant Val as validate_mocks.py

    Note over Procfile: Orchestrates 3 services
    Procfile->>GR: Start nemoguardrails server<br/>--config configs/guardrail_configs
    Procfile->>AppLLM: Start mock_llm_server<br/>meta-llama-3.3-70b-instruct
    Procfile->>CSLLM: Start mock_llm_server<br/>nemoguard-8b-content-safety

    Note over Val: Validation Script
    Val->>AppLLM: GET /health
    AppLLM-->>Val: {"status": "healthy"}
    Val->>AppLLM: GET /v1/models
    AppLLM-->>Val: [meta/llama-3.3-70b-instruct]
    
    Val->>CSLLM: GET /health
    CSLLM-->>Val: {"status": "healthy"}
    Val->>CSLLM: GET /v1/models
    CSLLM-->>Val: [nvidia/llama-3.1-nemoguard-8b-content-safety]
    
    Val->>GR: GET /v1/rails/configs
    GR-->>Val: [content_safety_colang1]

    Note over User,CSLLM: Runtime Flow
    User->>GR: POST /v1/chat/completions
    GR->>CSLLM: Check input safety
    CSLLM-->>GR: Safety result (0.5s)
    GR->>AppLLM: Generate response
    AppLLM-->>GR: LLM response (4s)
    GR->>CSLLM: Check output safety
    CSLLM-->>GR: Safety result (0.5s)
    GR-->>User: Guarded response
Loading

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Added infrastructure to simplify the benchmark setup by introducing a Procfile for running Guardrails and mock LLM servers with a single command using honcho. This replaces the previous 4-tmux-pane manual setup.

Key changes:

  • Added Procfile to orchestrate Guardrails server and two mock LLM servers (app_llm on port 8000, cs_llm on port 8001)
  • Added validate_mocks.py script to verify mock LLM health endpoints and Guardrails config endpoint
  • Restructured configs into guardrail_configs and mock_configs directories
  • All servers now start with poetry run honcho start from the benchmark directory

The changes follow established patterns from the related PR #1403 and provide better developer experience for benchmarking workflows.

Confidence Score: 4/5

  • This PR is safe to merge with low risk
  • The changes add new benchmark tooling without modifying core Guardrails functionality. The validation script is well-structured with proper error handling, timeout protection, and clear logging. All tests pass and pre-commit checks succeed. The only minor consideration is the hardcoded ports and model names in validate_mocks.py, which reduces flexibility but is acceptable for a development/testing tool.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/benchmark/validate_mocks.py 4/5 New validation script to health check mock LLM servers and Guardrails config endpoint

Sequence Diagram

sequenceDiagram
    participant User
    participant Honcho
    participant GR as Guardrails Server<br/>(port 9000)
    participant APP as App LLM Mock<br/>(port 8000)
    participant CS as Content Safety Mock<br/>(port 8001)
    participant Validator as validate_mocks.py

    User->>Honcho: poetry run honcho start
    Honcho->>GR: Start Guardrails server
    Honcho->>APP: Start App LLM mock
    Honcho->>CS: Start Content Safety mock
    
    Note over GR,CS: All services start concurrently
    
    GR-->>Honcho: Server ready (9000)
    APP-->>Honcho: Server ready (8000)
    CS-->>Honcho: Server ready (8001)
    
    User->>Validator: poetry run python validate_mocks.py
    
    Validator->>APP: GET /health
    APP-->>Validator: {"status": "healthy"}
    
    Validator->>APP: GET /v1/models
    APP-->>Validator: model list
    
    Validator->>CS: GET /health
    CS-->>Validator: {"status": "healthy"}
    
    Validator->>CS: GET /v1/models
    CS-->>Validator: model list
    
    Validator->>GR: GET /v1/rails/configs
    GR-->>Validator: configs array
    
    Validator-->>User: All endpoints healthy!
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@tgasser-nv tgasser-nv requested a review from trebedea November 3, 2025 17:31
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR simplifies the benchmark setup process by adding infrastructure to run Guardrails and mock LLM servers together. The key changes include:

  • Procfile: Added to orchestrate 3 services (Guardrails server on port 9000, App LLM mock on port 8000, Content Safety LLM mock on port 8001) using the honcho package
  • Config restructuring: Moved mock and Guardrails configs from mock_llm_server/configs/ to benchmark/configs/ for better organization
  • Validation script: Added validate_mocks.py to verify health and model endpoints of all services
  • Import fix: Updated run_server.py to use fully qualified module path (nemoguardrails.benchmark.mock_llm_server.api:app) instead of relative import
  • Logging improvement: Converted f-string logging statements to %s format (Python logging best practice to avoid unnecessary string interpolation)

The PR successfully reduces the complexity of starting the benchmark environment from 4 separate tmux panes to a single honcho start command, improving developer experience.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Score reflects simple infrastructure changes that improve developer workflow. The validation script follows best practices (using %s-style logging, proper error handling, clear exit codes). All tests pass, pre-commit checks pass, and the test plan demonstrates working functionality. Only minor note is that requests library needs to be installed separately as documented.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/benchmark/validate_mocks.py 5/5 Converted f-string logging statements to %s format (best practice for logging). Script validates health of Mock LLM servers and Guardrails endpoints.

Sequence Diagram

sequenceDiagram
    participant User
    participant ValidateMocks as validate_mocks.py
    participant AppLLM as Mock LLM (8000)
    participant CSLLM as Content Safety LLM (8001)
    participant Guardrails as Guardrails (9000)

    User->>ValidateMocks: Run validation script
    
    ValidateMocks->>AppLLM: GET /health
    AppLLM-->>ValidateMocks: {"status": "healthy"}
    ValidateMocks->>AppLLM: GET /v1/models
    AppLLM-->>ValidateMocks: {"data": [{"id": "meta/llama-3.3-70b-instruct"}]}
    
    ValidateMocks->>CSLLM: GET /health
    CSLLM-->>ValidateMocks: {"status": "healthy"}
    ValidateMocks->>CSLLM: GET /v1/models
    CSLLM-->>ValidateMocks: {"data": [{"id": "nvidia/llama-3.1-nemoguard-8b-content-safety"}]}
    
    ValidateMocks->>Guardrails: GET /v1/rails/configs
    Guardrails-->>ValidateMocks: [config_array]
    
    ValidateMocks->>User: Report all checks passed (exit 0)
Loading

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds a comprehensive test suite for the validate_mocks.py validation script. The tests thoroughly cover health checks, model validation, and Rails endpoint verification with proper mocking.

Key changes:

  • Comprehensive test coverage for check_endpoint() function including success, failure, timeout, and JSON parsing error cases
  • Full test coverage for check_rails_endpoint() function with various response scenarios
  • Tests for the main() function verifying proper exit codes and error handling
  • Script execution tests to verify the module can be imported and run directly

Issue found:

  • Line 464 contains a hardcoded absolute path that will fail in CI/CD and other environments

Confidence Score: 4/5

  • Safe to merge after fixing the hardcoded path issue on line 464
  • The test suite is comprehensive and well-structured with excellent coverage of edge cases. However, the hardcoded absolute path on line 464 will cause test failures in CI/CD environments and on different developer machines, preventing the test from running successfully
  • tests/benchmark/test_validate_mocks.py:464 requires fixing the hardcoded path

Important Files Changed

File Analysis

Filename Score Overview
tests/benchmark/test_validate_mocks.py 4/5 Comprehensive test suite for validate_mocks.py with one critical hardcoded path issue (line 464)

Sequence Diagram

sequenceDiagram
    participant Test as Test Suite
    participant Mock as Mock requests.get
    participant CheckEP as check_endpoint()
    participant CheckRails as check_rails_endpoint()
    participant Main as main()
    
    Test->>Mock: Setup mock responses
    Test->>CheckEP: Call with port & model
    CheckEP->>Mock: GET /health
    Mock-->>CheckEP: {"status": "healthy"}
    CheckEP->>Mock: GET /v1/models
    Mock-->>CheckEP: {"data": [{"id": "model"}]}
    CheckEP-->>Test: (success, summary)
    
    Test->>CheckRails: Call with port
    CheckRails->>Mock: GET /v1/rails/configs
    Mock-->>CheckRails: [config1, config2]
    CheckRails-->>Test: (success, summary)
    
    Test->>Main: Call main()
    Main->>CheckEP: Check port 8000
    Main->>CheckEP: Check port 8001
    Main->>CheckRails: Check port 9000
    Main-->>Test: sys.exit(0 or 1)
Loading

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Added comprehensive unit tests for validate_mocks.py script that validates mock LLM endpoints and Guardrails configuration. The tests cover:

  • check_endpoint function: 14 test cases covering health checks, model validation, connection errors, timeouts, and JSON parsing errors
  • check_rails_endpoint function: 7 test cases covering Rails config validation, response structure checks, and error handling
  • main function: 4 test cases covering success and failure scenarios with proper exit code validation

All tests use proper mocking with unittest.mock to avoid external dependencies. Tests follow pytest conventions and include clear docstrings. The test file matches the structure and error handling paths in the source code.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes add comprehensive unit tests for the validate_mocks.py script with excellent coverage of edge cases, error handling, and success paths. All tests use proper mocking to avoid external dependencies, follow pytest conventions, and the PR description shows tests pass successfully (2054 passed, 125 skipped). The test code is well-structured, properly documented, and thoroughly covers all functions in the source file.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
tests/benchmark/test_validate_mocks.py 5/5 New comprehensive unit test file covering all functions in validate_mocks.py with extensive edge case handling

Sequence Diagram

sequenceDiagram
    participant User
    participant Honcho
    participant GR as Guardrails Server<br/>(port 9000)
    participant AppLLM as App LLM Mock<br/>(port 8000)
    participant CSLLM as Content Safety LLM Mock<br/>(port 8001)
    participant Script as validate_mocks.py

    User->>Honcho: honcho start (reads Procfile)
    Honcho->>GR: Start Guardrails server
    Honcho->>AppLLM: Start App LLM mock
    Honcho->>CSLLM: Start Content Safety LLM mock
    
    Note over GR,CSLLM: All services start in parallel

    User->>Script: python validate_mocks.py
    
    Script->>AppLLM: GET /health
    AppLLM-->>Script: 200 OK {"status": "healthy"}
    
    Script->>AppLLM: GET /v1/models
    AppLLM-->>Script: 200 OK {"data": [{"id": "meta/llama-3.3-70b-instruct"}]}
    
    Script->>CSLLM: GET /health
    CSLLM-->>Script: 200 OK {"status": "healthy"}
    
    Script->>CSLLM: GET /v1/models
    CSLLM-->>Script: 200 OK {"data": [{"id": "nvidia/llama-3.1-nemoguard-8b-content-safety"}]}
    
    Script->>GR: GET /v1/rails/configs
    GR-->>Script: 200 OK [{"id": "content_safety_colang1"}]
    
    Script-->>User: All checks passed (exit 0)
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds a new benchmark extras category to pyproject.toml to install honcho and requests packages. These dependencies support the benchmarking infrastructure introduced in PR #1403, allowing users to run Guardrails and mock LLMs using a Procfile with a single command.

Key Changes:

  • Added requests >= 2.31.0 and honcho >= 1.1.0 as optional dependencies
  • Created new benchmark extras group for easy installation via pip install nemoguardrails[benchmark]
  • Updated all extras group to include benchmark dependencies
  • Updated poetry.lock with dependency resolution (includes version bumps for aiofiles and aiohttp)

The changes are minimal, well-structured, and follow existing patterns in the project.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes only add optional dependencies to pyproject.toml and update the lock file. No production code is modified. The dependencies are well-established packages (requests and honcho) with appropriate version constraints. Tests exist for the code that uses these dependencies.
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
pyproject.toml 5/5 Adds benchmark extras group with requests and honcho dependencies for benchmark tooling. Changes are well-structured and follow existing patterns.
poetry.lock 5/5 Lock file updated to include honcho and requests dependencies with their transitive dependencies. Standard poetry lock file update.

Sequence Diagram

sequenceDiagram
    participant User
    participant pip/poetry
    participant pyproject.toml
    participant poetry.lock
    participant honcho
    participant validate_mocks.py

    User->>pip/poetry: pip install nemoguardrails[benchmark]
    pip/poetry->>pyproject.toml: Read benchmark extras
    pyproject.toml-->>pip/poetry: requests>=2.31.0, honcho>=1.1.0
    pip/poetry->>poetry.lock: Resolve dependencies
    poetry.lock-->>pip/poetry: Resolved versions
    pip/poetry->>User: Installation complete
    
    User->>honcho: honcho start (via Procfile)
    honcho->>honcho: Start Guardrails server
    honcho->>honcho: Start Mock LLM servers
    
    User->>validate_mocks.py: python validate_mocks.py
    validate_mocks.py->>validate_mocks.py: Uses requests library
    validate_mocks.py->>User: Health check results
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link
Collaborator

@Pouyanpi Pouyanpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tgasser-nv . Do we expect the benchmark tools to be used when users install it from PyPI?
Currently it will be packaged and distributed when users run pip install nemoguardrails[benchmark], but there are some usability issues that will prevent users from running them without significant WARs.

For example as Procfile uses relative path the user experience looks like (a workarount):

pip install nemoguardrails[benchmark]

# step 1: find where configs are located?
BENCHMARK_DIR=$(python -c "import os, nemoguardrails.benchmark; \
  print(os.path.dirname(nemoguardrails.benchmark.__file__))")
CONFIG_FILE="$BENCHMARK_DIR/configs/mock_configs/meta-llama-3.3-70b-instruct.env"

echo "Using config: $CONFIG_FILE"

# step 2: run the mock server
python -m nemoguardrails.benchmark.mock_llm_server.run_server \
  --port 8000 \
  --config-file "$CONFIG_FILE"

this happens as in Procfile

gr: poetry run nemoguardrails server --config configs/guardrail_configs ...
app_llm: poetry run python mock_llm_server/run_server.py --port 8000 ...
cs_llm: poetry run python mock_llm_server/run_server.py --port 8001 ...

paths like configs/guardrail_configs and mock_llm_server/run_server.py are relative and after pip install, these paths don't exist relative to user's working directory and Procfile expects to run from nemoguardrails/benchmark/ directory
So Procfile cannot be used as-is.

Do we intent to package this? Also I think a minimal README.md sounds necessary.

Another example is the hard-coded ports which users will have hard time configuring them. Please see the comments below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove poetry.lock and do poetry lock --no-update

(better to do a soft reset, remove, force push or interactive rebase and force push)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted the single commit with changes to pyproject.toml and poetry.lock to undo these changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see hardcoded ports in multiple places

The ports (8000, 8001, 9000) are hardcoded in:

  • Procfile
  • validate_mocks.py
  • config.yml

If someone changes one location, they must remember to change all three. consider:

  • environment variables
  • a shared config file (config.py that reads from env vars with defaults)
  • or at minimum, we should document this dependency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a limitation at the moment. This is intended for local offline testing, to quantify performance regressions in around the same time as unit-tests take today. I could have a central config which then propagates out to the Guardrails configs, Mock LLM configs, and Procfile to tie all three together and keep them consistent. But I'd prefer to keep it simple and add more documentation to help guide people configure everything. Will get a readme written up for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Procfile has relative paths that assume we're running from the nemoguardrails/benchmark directory:

If we run honcho from the project root, these paths won't work. we need either:

  • absotlute paths from project root
  • clear documentation that honcho start must run from nemoguardrails/benchmark/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional, the Procfile is only intended for use in the benchmarking directory. If it was at the project root, I'd expect it to spin up a production Guardrails set of services rather than a set of mocked LLMs and Guardrails. I'll add a proper README to explain all this

@tgasser-nv tgasser-nv marked this pull request as draft November 5, 2025 20:51
@tgasser-nv tgasser-nv force-pushed the feat/use-honcho-benchmarking branch from a04215f to d379bc5 Compare November 6, 2025 22:49
@github-actions
Copy link
Contributor

Documentation preview

https://nvidia-nemo.github.io/Guardrails/review/pr-1490

@tgasser-nv tgasser-nv marked this pull request as ready for review November 10, 2025 14:09
@tgasser-nv tgasser-nv requested a review from Pouyanpi November 10, 2025 14:09
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR simplifies the benchmark setup by adding a Procfile and honcho-based orchestration, reducing the startup from 4 tmux panes to a single command.

Key Changes:

  • Added Procfile to orchestrate Guardrails server and two mock LLM servers via honcho
  • Added validate_mocks.py script using httpx (already in dependencies) to verify health and endpoints
  • Added comprehensive unit tests for the validation script
  • Reorganized configuration files from mock_llm_server/configs to benchmark/configs for better structure
  • Enhanced run_server.py with --workers CLI argument and fixed module path for poetry run compatibility
  • Added detailed README documentation with quickstart guide and configuration explanations

Integration:
The Procfile coordinates three services: Guardrails server (port 9000), Application LLM mock (port 8000 with 4 workers), and Content Safety LLM mock (port 8001 with 4 workers). The validation script uses httpx to check /health and /v1/models endpoints on the mocks, and /v1/rails/configs on Guardrails.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk - it adds helpful tooling without modifying core functionality
  • Score reflects excellent code quality with comprehensive testing, clear documentation, proper use of existing dependencies (httpx), well-organized file structure, and no changes to production code paths - only adds developer tooling for benchmarking
  • No files require special attention

Important Files Changed

File Analysis

Filename Score Overview
nemoguardrails/benchmark/Procfile 5/5 Added Procfile to orchestrate Guardrails server and two mock LLM servers with honcho, enabling single-command startup of the benchmark environment
nemoguardrails/benchmark/validate_mocks.py 5/5 Added validation script using httpx (already in dependencies) to check health and model endpoints for the mock servers and Guardrails config endpoint
tests/benchmark/test_validate_mocks.py 5/5 Comprehensive unit tests for validate_mocks.py with excellent coverage of success and failure scenarios
nemoguardrails/benchmark/README.md 5/5 Added comprehensive documentation explaining the benchmark setup, Procfile usage, and configuration details with clear examples
nemoguardrails/benchmark/mock_llm_server/run_server.py 5/5 Added --workers CLI argument and fixed module path from 'api:app' to full path for compatibility with poetry run from different directories

Sequence Diagram

sequenceDiagram
    participant User
    participant Honcho
    participant GR as Guardrails Server<br/>(port 9000)
    participant AppLLM as App LLM Mock<br/>(port 8000)
    participant CSLLM as Content Safety LLM Mock<br/>(port 8001)
    participant Validator as validate_mocks.py

    Note over User,Honcho: Startup Phase
    User->>Honcho: poetry run honcho start
    Honcho->>GR: Start Guardrails server
    Honcho->>AppLLM: Start app_llm mock (4 workers)
    Honcho->>CSLLM: Start cs_llm mock (4 workers)
    
    AppLLM-->>Honcho: Ready on port 8000
    CSLLM-->>Honcho: Ready on port 8001
    GR-->>Honcho: Ready on port 9000

    Note over User,Validator: Validation Phase
    User->>Validator: poetry run python validate_mocks.py
    Validator->>AppLLM: GET /health
    AppLLM-->>Validator: {"status": "healthy"}
    Validator->>AppLLM: GET /v1/models
    AppLLM-->>Validator: meta/llama-3.3-70b-instruct
    
    Validator->>CSLLM: GET /health
    CSLLM-->>Validator: {"status": "healthy"}
    Validator->>CSLLM: GET /v1/models
    CSLLM-->>Validator: nvidia/llama-3.1-nemoguard-8b-content-safety
    
    Validator->>GR: GET /v1/rails/configs
    GR-->>Validator: [config array]
    Validator-->>User: All checks PASSED

    Note over User,GR: Request Flow
    User->>GR: POST /v1/chat/completions
    GR->>CSLLM: Input rail: content safety check
    CSLLM-->>GR: {"User Safety": "safe"}
    GR->>AppLLM: Generate response
    AppLLM-->>GR: Response text
    GR->>CSLLM: Output rail: content safety check
    CSLLM-->>GR: {"Response Safety": "safe"}
    GR-->>User: Final response
Loading

9 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants