Skip to content

Conversation

@drorIvry
Copy link
Contributor

@drorIvry drorIvry commented Jun 28, 2025

  • --wip-- [skip ci]
  • --wip-- [skip ci]

Summary by CodeRabbit

  • New Features

    • Added support for configuring and running multiple parallel evaluation runs (up to 10) in the UI, including a new slider for "Number of parallel evaluation runs."
    • The scenario runner UI now displays grouped status and chat outputs for each parallel run, providing real-time feedback per batch.
    • Introduced prompt injection evaluation with asynchronous testing of AI agents against prompt injection payloads and automated judgment using an LLM.
    • Added a new configuration parameter for parallel runs integrated throughout the UI and evaluation logic.
    • Added HTTP authorization header support for multiple auth types including API key, bearer token, and basic auth.
  • Refactor

    • Scenario evaluation is now performed asynchronously in parallel batches, improving efficiency and user feedback.
    • The configuration and scenario runner screens were updated to handle the new parallelism setting and outputs.
    • Simplified evaluation results handling and aggregation logic.
    • Refined scenario evaluation service to include prompt injection scenarios with error handling and status updates.
    • Updated evaluation results merging to be instance-based and added truthiness checks.
  • Style

    • Minor text adjustment to the "Deep Test Mode" description for clarity.
  • Chores

    • Updated pre-commit configuration formatting and removed explicit Python version and arguments.
    • Added module imports to package initialization for prompt injection evaluation.
    • Improved type alias declarations and constructor formatting for better code clarity.
  • Enhancements

    • Improved summary generation with clearer formatting instructions, including emoji indicators for pass/fail statuses.

@drorIvry drorIvry requested review from Copilot and yuval-qf June 28, 2025 08:55
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 28, 2025

Walkthrough

The changes introduce support for parallel scenario evaluation runs by adding a parallel_runs parameter to the agent configuration and UI. The scenario runner UI is refactored to handle multiple concurrent evaluations with asynchronous workers, updating the interface per run. Configuration, evaluation result handling, and summary generation are updated to accommodate parallelism. A new prompt injection evaluation feature is added with related models and async evaluation logic. Minor formatting, type aliasing, and prompt text refinements were also made.

Changes

File(s) Change Summary
.pre-commit-config.yaml Removed explicit Python version for Black; reformatted and simplified mypy and bandit hook configs.
rogue/models/config.py Added parallel_runs: int = 1 field to AgentConfig; added get_auth_header function for auth headers.
rogue/models/evaluation_result.py Replaced classmethod combine with instance method; added __bool__ method for truthiness; simplified merging.
rogue/models/prompt_injection.py Added Pydantic models for prompt injection evaluation: payload, evaluation, and result classes.
rogue/services/scenario_evaluation_service.py Added async method to evaluate prompt injection scenarios; yield status updates; changed final yield to "done".
rogue/services/llm_service.py Updated summary prompt to specify pass/fail status shown as emojis (✅, ❌) in detailed breakdown table; fixed dataset name.
rogue/ui/app.py Added parallel_runs config support in UI state and loading logic.
rogue/ui/components/config_screen.py Added slider for parallel_runs; updated save logic to persist and propagate new setting; minor text tweak.
rogue/ui/components/scenario_runner.py Refactored for parallel scenario evaluation: batch splitting, async workers, dynamic UI per run, result merging.
rogue/init.py Imported prompt_injection_evaluator module into package namespace.
rogue/prompt_injection_evaluator/init.py Imported run_prompt_injection_evaluator module.
rogue/prompt_injection_evaluator/run_prompt_injection_evaluator.py Added async prompt injection evaluation logic using judge LLM, dataset loading, and result aggregation.
rogue/common/remote_agent_connection.py Added TypeAlias annotations for type aliases; reformatted constructor parameters for readability.
rogue/evaluator_agent/run_evaluator_agent.py Replaced local auth header helper with imported get_auth_header function from config module.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant UI
    participant ConfigScreen
    participant ScenarioRunner
    participant ScenarioEvaluationService
    participant LLMService

    User->>UI: Load App
    UI->>ConfigScreen: Show config (includes parallel_runs slider)
    User->>ConfigScreen: Set parallel_runs and save config
    ConfigScreen->>UI: Update shared state with parallel_runs

    User->>ScenarioRunner: Click "Run" to evaluate scenarios
    ScenarioRunner->>ScenarioRunner: Split scenarios into batches (parallel_runs)
    loop For each batch
        ScenarioRunner->>ScenarioEvaluationService: Start async worker for batch
        ScenarioEvaluationService->>ScenarioRunner: Yield status/chat updates
    end
    ScenarioRunner->>ScenarioEvaluationService: Evaluate prompt injection scenarios asynchronously
    ScenarioEvaluationService->>ScenarioRunner: Yield prompt injection evaluation updates
    ScenarioRunner->>ScenarioRunner: Merge all EvaluationResults
    ScenarioRunner->>LLMService: Generate summary from combined results
    ScenarioRunner->>UI: Update UI with results, summary, and status per run
Loading

Poem

In parallel lines the agents run,
No longer working one by one.
Sliders and screens now set the pace,
Results combine in swift embrace.
Prompt injections met with care,
Rabbits cheer the thorough fare!
🐇✨🔍


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 0d9579c and 9ffe1e8.

📒 Files selected for processing (1)
  • rogue/ui/components/config_screen.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • rogue/ui/components/config_screen.py

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces parallel scenario evaluation to improve testing speed and UI responsiveness by running multiple evaluation tasks concurrently. Key changes include adding a parallel_runs configuration in the config screen and AgentConfig, refactoring the scenario runner UI and logic to manage multiple concurrent evaluation tasks, and updating the evaluation service to aggregate results from parallel runners.

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
rogue/ui/components/scenario_runner.py Updated to implement parallel evaluation with batched scenarios and multiple UI output groups.
rogue/ui/components/config_screen.py Added a new slider for parallel_runs and updated configuration logic.
rogue/ui/app.py Adjusted to pass the parallel_runs value through the app components.
rogue/services/scenario_evaluation_service.py Updated to aggregate evaluation results using the new combine method.
rogue/models/evaluation_result.py Revised the combine method in EvaluationResults for merging results.
rogue/models/config.py Introduced the parallel_runs field in AgentConfig.
parallel_evaluation_plan.md Added documentation for the new parallel evaluation architecture.
.pre-commit-config.yaml Minor formatting configuration changes for pre-commit hooks.
Comments suppressed due to low confidence (1)

.pre-commit-config.yaml:15

  • [nitpick] Removing the language_version for the Black hook may lead to inconsistent formatting if different environments use varying Python versions. It might be beneficial to specify the version to ensure consistency.
      - id: black

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
rogue/ui/components/scenario_runner.py (1)

158-161: Store and await worker tasks for proper task management.

The current implementation creates tasks but doesn't store or await them, which can lead to poor error propagation and potential task loss.

Apply this fix to properly manage the worker tasks:

-        [
-            asyncio.create_task(worker(batch, i))
-            for i, batch in enumerate(scenario_batches)
-        ]
+        worker_tasks = [
+            asyncio.create_task(worker(batch, i))
+            for i, batch in enumerate(scenario_batches)
+        ]

Consider adding task cleanup after the main processing loop:

# After line 190, add:
# Cancel any remaining tasks in case of early termination
for task in worker_tasks:
    if not task.done():
        task.cancel()

# Await all tasks to ensure proper cleanup
await asyncio.gather(*worker_tasks, return_exceptions=True)
🧹 Nitpick comments (2)
.pre-commit-config.yaml (1)

21-24: Optional: tighten Bandit execution

Bandit will run with its default profile (-lll).
If you already maintain a customised bandit.yml or wish to fail the hook only on high-severity issues, consider pinning the configuration under the hook:

       - id: bandit
+        args: ["-c", "path/to/bandit.yml", "-lll"]

Not critical, but makes the security scan predictable across developers.

rogue/ui/components/config_screen.py (1)

83-90: Add user guidance for parallel runs configuration.

The parallel runs slider is well-implemented, but consider adding explanatory text to help users understand what this setting does and potential resource implications of higher values.

Consider adding guidance similar to other settings:

         parallel_runs = gr.Slider(
             label="Number of parallel evaluation runs",
             minimum=1,
             maximum=10,
             step=1,
             value=config_data.get("parallel_runs", 1),
         )
+        gr.Markdown(
+            "Controls how many scenario evaluations run concurrently. "
+            "Higher values may improve performance but consume more resources."
+        )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between de71cf7 and a95864d.

📒 Files selected for processing (7)
  • .pre-commit-config.yaml (1 hunks)
  • rogue/models/config.py (1 hunks)
  • rogue/models/evaluation_result.py (1 hunks)
  • rogue/services/scenario_evaluation_service.py (1 hunks)
  • rogue/ui/app.py (3 hunks)
  • rogue/ui/components/config_screen.py (6 hunks)
  • rogue/ui/components/scenario_runner.py (2 hunks)
🔇 Additional comments (7)
.pre-commit-config.yaml (1)

17-20: Verify that mypy still picks up your project configuration

The args: ["--config-file=..."] stanza was removed.
If your codebase keeps its type-checker settings exclusively in pyproject.toml / mypy.ini, nothing breaks.
However, if the config file lives elsewhere (e.g. configs/mypy.ini) or you rely on extra CLI flags, the hook will now run with mypy’s defaults and may flood CI with false positives.

Would you double-check by running pre-commit run mypy --all-files locally and confirm it honours the intended configuration? If it doesn’t, re-add the explicit args key:

       - id: mypy
+        args: ["--config-file=path/to/mypy.ini"]
rogue/models/config.py (1)

22-22: LGTM! Clean addition of parallel runs configuration.

The new parallel_runs field is well-implemented with an appropriate default value of 1, ensuring backward compatibility and safe defaults.

rogue/services/scenario_evaluation_service.py (1)

78-84: Nit: Remove redundant all_results assignment

  • The all_results = self._results line is unnecessary—you can reference self._results directly in both the write and the final yield.
  • I searched the codebase for any consumers expecting a "results" status from evaluate_scenarios and found none. Changing the final yield to "done" is safe and won’t break existing usage.

No further action required.

rogue/ui/app.py (1)

42-42: LGTM! Consistent integration of parallel_runs parameter.

The parallel_runs parameter is properly integrated into the UI state management following the same pattern as other configuration parameters.

Also applies to: 118-118, 148-148

rogue/ui/components/config_screen.py (1)

215-239: LGTM! Proper integration into configuration save/load logic.

The parallel_runs parameter is correctly integrated into the configuration saving, validation, and component state management.

Also applies to: 294-320

rogue/models/evaluation_result.py (1)

24-32: Ensure simplified result handling covers all use cases

Briefly, searches for external calls to the old combine class method and for scenario-based uses of add_result returned no hits. However, absence of references doesn’t guarantee that no parts of the codebase (or tests) rely on recursive merging or deduplication. Please manually verify:

  • rogue/models/evaluation_result.py (lines 24–32):
    • add_result: confirm no existing logic elsewhere depends on automatic merging or deduplication.
    • combine: ensure no downstream modules expect recursive flattening of nested EvaluationResults.
  • All test suites: verify that duplicate results aren’t being introduced or that existing tests cover parallel/duplicate-scenario evaluations.
  • Any data-processing pipelines or consumers of EvaluationResults: check for assumptions about result uniqueness or structure.
rogue/ui/components/scenario_runner.py (1)

20-26: Well-implemented batch splitting function.

The function correctly handles edge cases and uses numpy's array_split for even distribution of scenarios across batches.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
rogue/ui/components/scenario_runner.py (2)

158-162: LGTM: Proper use of asyncio.gather for concurrent execution.

The implementation correctly uses asyncio.gather() to run worker tasks concurrently, which addresses the previous review suggestion.


28-28: Breaking change: Function signature modified.

This is a duplicate of the previous review comment. The return signature was changed from 3 to 2 values, requiring updates to callers.

Also applies to: 220-220

🧹 Nitpick comments (2)
rogue/ui/components/scenario_runner.py (2)

20-25: Handle edge case for empty batch results.

The function correctly validates inputs, but consider the edge case where np.array_split might create empty arrays for very small scenario lists with large n values.

-    return [arr.tolist() for arr in np.array_split(scenarios, n) if arr.size > 0]
+    batches = [arr.tolist() for arr in np.array_split(scenarios, n) if arr.size > 0]
+    return batches if batches else [scenarios] if scenarios else []

169-191: Add timeout protection for queue processing.

The queue processing loop could potentially hang if a worker fails to send the "done" signal. Consider adding a timeout mechanism.

 while finished_workers < num_runners:
-    worker_id, update_type, data = await update_queue.get()
+    try:
+        worker_id, update_type, data = await asyncio.wait_for(
+            update_queue.get(), timeout=300  # 5 minute timeout
+        )
+    except asyncio.TimeoutError:
+        logger.error("Timeout waiting for worker updates")
+        break
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between a95864d and 3671b43.

📒 Files selected for processing (1)
  • rogue/ui/components/scenario_runner.py (2 hunks)
🔇 Additional comments (3)
rogue/ui/components/scenario_runner.py (3)

2-3: LGTM: Required imports for parallel execution.

The asyncio and numpy imports are appropriately added to support the new parallel execution functionality.


17-18: ✅ MAX_PARALLEL_RUNS aligns with UI slider limit

Verified that the parallel_runs slider in rogue/ui/components/config_screen.py is configured with maximum=10, matching MAX_PARALLEL_RUNS = 10. No further changes required.


185-186: EvaluationResults.combine modifies self in-place—no changes required

The combine method in rogue/models/evaluation_result.py extends self.results without returning a new instance, so calling all_results.combine(data) correctly merges results in-place.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
rogue/ui/components/scenario_runner.py (4)

37-59: Magic number calculations reduce maintainability.

The indexing logic using i * 3, i * 3 + 1, etc. makes the code hard to maintain and was flagged in previous reviews.

Consider defining a constant like COMPONENTS_PER_RUN = 3 and using structured indexing to improve code clarity.


94-98: Repeated magic number calculations are error-prone.

The UI update logic with hardcoded index calculations is repeated throughout the code and was flagged in previous reviews.

Extract this logic into helper functions to improve maintainability and reduce the risk of indexing errors.


219-219: Function signature change affects callers.

The return signature changed from (scenarios_display, status_box, run_button) to (scenarios_display, run_button), which was flagged in previous reviews as affecting callers in rogue/ui/app.py.

Ensure all callers are updated to handle the new signature.


158-161: Critical: Tasks created but not properly managed.

The list comprehension creates asyncio tasks but doesn't assign them to a variable or await them properly. This can lead to tasks running without proper lifecycle management and potential resource leaks.

Apply this fix to properly manage the worker tasks:

-        [
-            asyncio.create_task(worker(batch, i))
-            for i, batch in enumerate(scenario_batches)
-        ]
+        worker_tasks = [
+            asyncio.create_task(worker(batch, i))
+            for i, batch in enumerate(scenario_batches)
+        ]
+        
+        # Start all tasks concurrently
+        try:
+            await asyncio.gather(*worker_tasks, return_exceptions=True)
+        except Exception as e:
+            logger.error(f"Error in worker tasks: {e}")
+            # Cancel remaining tasks
+            for task in worker_tasks:
+                if not task.done():
+                    task.cancel()
🧹 Nitpick comments (1)
rogue/ui/components/scenario_runner.py (1)

17-17: Consider making MAX_PARALLEL_RUNS configurable.

While a constant is good for setting an upper limit, consider whether this should be configurable via environment variables or configuration files for different deployment scenarios.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 3671b43 and 687e95c.

📒 Files selected for processing (1)
  • rogue/ui/components/scenario_runner.py (2 hunks)
🔇 Additional comments (2)
rogue/ui/components/scenario_runner.py (2)

2-3: LGTM: Required imports for parallel execution.

The addition of asyncio and numpy imports is appropriate for the new parallel execution functionality.


20-25: LGTM: Well-implemented batch splitting function.

The function correctly handles edge cases (empty list, non-positive n) and uses numpy for efficient array splitting. The filtering of empty arrays ensures no worker gets an empty batch.

visibility_updates[i * 3] = gr.update(visible=True)
yield tuple(visibility_updates)

# 3. --- Define and Run Worker Tasks ---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should move to the scenario evaluation service (most of the logic in this file)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's go for another pr on that, this one is getting a bit too long

@drorIvry drorIvry requested a review from yuval-qf June 30, 2025 09:56
* --wip-- [skip ci]

* --wip-- [skip ci]

* --wip-- [skip ci]

* --wip-- [skip ci]

* Some fixes

* mypy fixes

---------

Co-authored-by: Yuval <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
rogue/ui/components/scenario_runner.py (3)

113-116: Address magic number calculations in UI component indexing.

The code still uses magic numbers (i * 3, i * 3 + 1, i * 3 + 2) for component indexing, making it hard to maintain.

+COMPONENTS_PER_RUN = 3  # group, status_box, chat_display
+GROUP_IDX_OFFSET = 0
+STATUS_IDX_OFFSET = 1
+CHAT_IDX_OFFSET = 2
+
+def get_component_indices(worker_id):
+    base = worker_id * COMPONENTS_PER_RUN
+    return base + GROUP_IDX_OFFSET, base + STATUS_IDX_OFFSET, base + CHAT_IDX_OFFSET

 # Then use throughout:
-initial_updates[i * 3] = gr.update(visible=False)
-initial_updates[i * 3 + 1] = gr.update(value="", visible=True)
-initial_updates[i * 3 + 2] = gr.update(value=None, visible=True)
+group_idx, status_idx, chat_idx = get_component_indices(i)
+initial_updates[group_idx] = gr.update(visible=False)
+initial_updates[status_idx] = gr.update(value="", visible=True)
+initial_updates[chat_idx] = gr.update(value=None, visible=True)

Also applies to: 182-182, 188-190, 195-197


167-170: Critical: Worker tasks are not awaited.

The tasks are created but not stored or awaited, causing the code to proceed without waiting for completion.

-[
-    asyncio.create_task(worker(batch, i))
-    for i, batch in enumerate(scenario_batches)
-]
+worker_tasks = [
+    asyncio.create_task(worker(batch, i))
+    for i, batch in enumerate(scenario_batches)
+]
+# The tasks will be awaited implicitly through the queue processing loop

177-198: Add timeout protection to queue processing.

The loop could hang indefinitely if a worker crashes before sending "done".

 while finished_workers < num_runners:
-    worker_id, update_type, data = await update_queue.get()
+    try:
+        worker_id, update_type, data = await asyncio.wait_for(
+            update_queue.get(), timeout=300  # 5 minutes timeout
+        )
+    except asyncio.TimeoutError:
+        logger.error("Timeout waiting for worker updates")
+        break
🧹 Nitpick comments (4)
rogue/services/scenario_evaluation_service.py (1)

20-20: Consider making MAX_SAMPLES configurable.

The hardcoded limit of 10 samples might be too restrictive for comprehensive testing. Consider making this configurable through the service constructor or configuration.

rogue/prompt_injection_evaluator/run_prompt_injection_evaluator.py (3)

54-75: Consider improving data part handling.

When handling data parts, json.dumps might produce verbose output. Consider adding a size limit or summary representation for large data objects.

 elif p.root.kind == "data":
-    text += json.dumps(p.root.data)
+    data_str = json.dumps(p.root.data)
+    if len(data_str) > 1000:
+        text += f"[Data object: {len(data_str)} chars]"
+    else:
+        text += data_str

94-94: Address TODO: Use Pydantic model for response format.

Using a Pydantic model would provide better type safety and validation for the LLM response.

Would you like me to generate a Pydantic model for the response format and update the completion call?


165-165: Optimize logging with lazy formatting.

Avoid using f-strings in logging statements for better performance. The logger can format the message lazily.

-logger.info(f"Response: {response}", extra={"response": response})
+logger.info("Response: %s", response, extra={"response": response})
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 0d40584 and c1c2a79.

📒 Files selected for processing (9)
  • rogue/__init__.py (1 hunks)
  • rogue/common/remote_agent_connection.py (3 hunks)
  • rogue/models/config.py (2 hunks)
  • rogue/models/prompt_injection.py (1 hunks)
  • rogue/prompt_injection_evaluator/__init__.py (1 hunks)
  • rogue/prompt_injection_evaluator/run_prompt_injection_evaluator.py (1 hunks)
  • rogue/services/llm_service.py (2 hunks)
  • rogue/services/scenario_evaluation_service.py (3 hunks)
  • rogue/ui/components/scenario_runner.py (3 hunks)
✅ Files skipped from review due to trivial changes (3)
  • rogue/init.py
  • rogue/prompt_injection_evaluator/init.py
  • rogue/models/prompt_injection.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • rogue/services/llm_service.py
  • rogue/models/config.py
🧰 Additional context used
🧠 Learnings (2)
rogue/ui/components/scenario_runner.py (1)
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/taskmaster.mdc:0-0
Timestamp: 2025-06-23T12:36:58.787Z
Learning: Taskmaster's AI-powered tools (parse_prd, analyze_project_complexity, update_subtask, update_task, update, expand_all, expand_task, add_task) may take up to a minute to complete due to AI processing; always inform users to wait during these operations.
rogue/common/remote_agent_connection.py (1)
Learnt from: CR
PR: qualifire-dev/qualifire#0
File: .cursor/rules/tech-stack.mdc:0-0
Timestamp: 2025-06-23T12:37:02.487Z
Learning: Use httpx for making HTTP requests.
🔇 Additional comments (3)
rogue/common/remote_agent_connection.py (1)

1-1: LGTM! Good use of explicit type aliases.

Using TypeAlias makes the type aliases explicit and improves code clarity. This is the recommended approach for Python 3.10+.

Also applies to: 32-45, 47-48

rogue/prompt_injection_evaluator/run_prompt_injection_evaluator.py (1)

24-51: LGTM! Well-structured evaluation prompt.

The prompt template clearly defines the evaluation criteria and expected output format.

rogue/ui/components/scenario_runner.py (1)

18-38: LGTM! Well-implemented batch splitting logic.

The function correctly handles edge cases including empty lists, remainder distribution, and avoids empty batches.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
rogue/evaluator_agent/run_evaluator_agent.py (1)

92-92: LGTM! Clean replacement of local function with centralized utility.

The use of get_auth_header correctly replaces the local header construction logic with the centralized implementation.

For consistency, consider aligning the type annotation for auth_credentials parameter with the imported function:

-    auth_credentials: str | None,
+    auth_credentials: Optional[str],

Both str | None and Optional[str] are functionally equivalent, but using consistent typing across related functions improves readability.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 4951180 and 0d9579c.

📒 Files selected for processing (2)
  • rogue/evaluator_agent/run_evaluator_agent.py (2 hunks)
  • rogue/models/config.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • rogue/models/config.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
rogue/evaluator_agent/run_evaluator_agent.py (1)
rogue/models/config.py (2)
  • AuthType (7-11)
  • get_auth_header (40-52)
🔇 Additional comments (1)
rogue/evaluator_agent/run_evaluator_agent.py (1)

18-18: LGTM! Good refactoring to centralize authentication logic.

The import of get_auth_header aligns with the centralization of authentication header construction logic.

@yuval-qf yuval-qf enabled auto-merge (squash) July 1, 2025 19:41
@yuval-qf yuval-qf merged commit 2eddb3e into main Jul 1, 2025
2 checks passed
@yuval-qf yuval-qf deleted the feature/parallel-runners branch July 1, 2025 19:43
yuval-qf added a commit that referenced this pull request Oct 8, 2025
yuval-qf added a commit that referenced this pull request Oct 8, 2025
* Reduce rogue startup import time

* Reduce rogue startup import time #2

* Reduce rogue startup import time #3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants