Skip to content

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Aug 13, 2025

Summary by CodeRabbit

  • New Features

    • Known-hosts handling for remote connections is now optional; when not provided, unknown hosts are accepted to simplify first-time connections while existing known-hosts behavior stays the same.
  • Tests

    • Expanded test coverage and added host-based remote configuration to validate remote connection and authentication scenarios.

Copy link

coderabbitai bot commented Aug 13, 2025

Warning

Rate limit exceeded

@jan-janssen has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 45 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8868a61 and 6dd12d0.

📒 Files selected for processing (1)
  • tests/config/remote_rebex/queue.yaml (0 hunks)

Walkthrough

Make known_hosts handling optional in RemoteQueueAdapter: _ssh_known_hosts is set to an absolute path if provided in config or an empty string "" otherwise. _open_ssh_connection loads host keys only when _ssh_known_hosts is non-empty; otherwise it sets AutoAddPolicy before connecting.

Changes

Cohort / File(s) Summary
SSH host key handling
pysqa/base/remote.py
_ssh_known_hosts now becomes os.path.abspath(config["known_hosts"]) when present or "" when absent. _open_ssh_connection conditionally calls ssh.load_host_keys(...) if len(self._ssh_known_hosts) > 0; otherwise it calls ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()). Public signatures unchanged.
Host-based test configs
tests/config/remote_rebex_hosts/queue.yaml
New test config defining a REMOTE queue pointing at test.rebex.net with credentials, known_hosts path, paths, ssh_continous_connection flag, port, python_executable and resource limits.
Tests updated to use host config
tests/test_remote.py, tests/test_remote_auth.py
Multiple tests switched from config/remote_rebex to config/remote_rebex_hosts. Added host-based remote command tests (individual and continuous connection) and adjusted auth tests to use the host-based config.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Test / User
  participant RQA as RemoteQueueAdapter
  participant SSH as Paramiko SSHClient

  Caller->>RQA: __init__(config)
  alt config contains known_hosts
    RQA->>RQA: _ssh_known_hosts = abs(path)
  else
    RQA->>RQA: _ssh_known_hosts = ""
  end

  Caller->>RQA: _open_ssh_connection()
  alt _ssh_known_hosts != ""
    RQA->>SSH: load_host_keys(_ssh_known_hosts)
  else
    RQA->>SSH: set_missing_host_key_policy(AutoAddPolicy)
  end
  RQA->>SSH: connect(...)
  SSH-->>RQA: connection/session
  RQA-->>Caller: client/session
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

  • Add remote test #441: Adjusts SSH connection behavior in pysqa/base/remote.py (related changes to host-key handling and key lookup).

Poem

A rabbit taps keys with whiskers bright,
Known hosts or blank, I choose the light.
If a path is given, I double-check the scroll;
If empty, AutoAdd helps the tunnel roll.
Hopping through SSH, I deliver each goal. 🐇🔐

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch known_hosts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Aug 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.39%. Comparing base (fafe096) to head (6dd12d0).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #451      +/-   ##
==========================================
+ Coverage   92.36%   92.39%   +0.03%     
==========================================
  Files          17       17              
  Lines        1008     1012       +4     
==========================================
+ Hits          931      935       +4     
  Misses         77       77              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
pysqa/base/remote.py (1)

448-451: Inconsistent and insecure host-key policy for proxy connection

For the proxy (bastion) connection, AutoAddPolicy is hardcoded, bypassing the known_hosts behavior and opt-in policy you established earlier. This is a security hole and inconsistent with the rest of the connection setup.

Align the proxy client policy with the main client:

-            client_new = paramiko.SSHClient()
-            client_new.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+            client_new = paramiko.SSHClient()
+            client_new.load_system_host_keys()
+            if self._ssh_known_hosts is not None:
+                try:
+                    client_new.load_host_keys(self._ssh_known_hosts)
+                except FileNotFoundError:
+                    warnings.warn(
+                        f"known_hosts file not found at '{self._ssh_known_hosts}'. Using system host keys only.",
+                        stacklevel=2,
+                    )
+            if getattr(self, "_ssh_auto_add_unknown_hosts", False):
+                client_new.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+            else:
+                client_new.set_missing_host_key_policy(paramiko.RejectPolicy())
🧹 Nitpick comments (2)
pysqa/base/remote.py (2)

28-35: Update attributes docstring to reflect Optional known_hosts and new policy flag

doc drift: _ssh_known_hosts can now be None; document it. Also document the new ssh_auto_add_unknown_hosts flag if you adopt it.

Apply this doc update:

-        _ssh_known_hosts (str): The path to the known hosts file.
+        _ssh_known_hosts (Optional[str]): Path to the known hosts file. None uses system host keys.
+        _ssh_auto_add_unknown_hosts (bool): If True, auto-accept and add unknown host keys (unsafe by default).

351-469: Optional: Persist auto-added host keys to disk when a known_hosts path is configured

Right now, AutoAddPolicy only amends the in-memory host keys for the client session. If you intend to "automatically manage known hosts keys," consider persisting newly seen keys to the configured known_hosts file so future sessions verify them.

You can add a small helper and call it after each successful ssh.connect(...):

def _persist_server_key(self, ssh: paramiko.SSHClient) -> None:
    if not getattr(self, "_ssh_auto_add_unknown_hosts", False):
        return
    if not self._ssh_known_hosts:
        return
    transport = get_transport(ssh)
    server_key = transport.get_remote_server_key()
    hostkeys = paramiko.HostKeys()
    if os.path.exists(self._ssh_known_hosts):
        hostkeys.load(self._ssh_known_hosts)
    hostkeys.add(self._ssh_host, server_key.get_name(), server_key)
    os.makedirs(os.path.dirname(self._ssh_known_hosts), exist_ok=True)
    hostkeys.save(self._ssh_known_hosts)

Then, right after each ssh.connect(...) call in _open_ssh_connection, invoke:

self._persist_server_key(ssh)

And similarly after client_new.connect(...) for the proxy.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8051c3d and 0cf2c65.

📒 Files selected for processing (1)
  • pysqa/base/remote.py (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: unittest_matrix (windows-latest, 3.13)
  • GitHub Check: unittest_slurm

Comment on lines 96 to 101
if "known_hosts" in config:
self._ssh_known_hosts = os.path.abspath(
os.path.expanduser(config["known_hosts"])
)
else:
self._ssh_known_hosts = None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Harden known_hosts config handling; avoid empty-string bug and add explicit opt-in for auto-accepting unknown hosts

  • Using "key in config" treats an empty string as a valid path and resolves it to the current working directory, causing paramiko to try to read a non-existent file.
  • Defaulting to AutoAddPolicy downstream without explicit opt-in is a security regression; prefer system host keys by default and make auto-accept a config flag.

Proposed fix:

  • Treat empty or missing known_hosts as None.
  • Validate file existence and warn if missing.
  • Introduce ssh_auto_add_unknown_hosts (default False).

Apply this diff:

-        if "known_hosts" in config:
-            self._ssh_known_hosts = os.path.abspath(
-                os.path.expanduser(config["known_hosts"])
-            )
-        else:
-            self._ssh_known_hosts = None
+        # Known hosts: allow None, but prefer system host keys when unspecified.
+        kh = config.get("known_hosts")
+        if kh:
+            path = os.path.abspath(os.path.expanduser(kh))
+            if os.path.exists(path):
+                self._ssh_known_hosts = path
+            else:
+                warnings.warn(
+                    f"known_hosts file not found at '{path}'. Falling back to system host keys.",
+                    stacklevel=2,
+                )
+                self._ssh_known_hosts = None
+        else:
+            self._ssh_known_hosts = None
+        # Opt-in flag to auto-accept unknown host keys (unsafe by default).
+        self._ssh_auto_add_unknown_hosts = bool(
+            config.get("ssh_auto_add_unknown_hosts", False)
+        )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if "known_hosts" in config:
self._ssh_known_hosts = os.path.abspath(
os.path.expanduser(config["known_hosts"])
)
else:
self._ssh_known_hosts = None
# Known hosts: allow None, but prefer system host keys when unspecified.
kh = config.get("known_hosts")
if kh:
path = os.path.abspath(os.path.expanduser(kh))
if os.path.exists(path):
self._ssh_known_hosts = path
else:
warnings.warn(
f"known_hosts file not found at '{path}'. Falling back to system host keys.",
stacklevel=2,
)
self._ssh_known_hosts = None
else:
self._ssh_known_hosts = None
# Opt-in flag to auto-accept unknown host keys (unsafe by default).
self._ssh_auto_add_unknown_hosts = bool(
config.get("ssh_auto_add_unknown_hosts", False)
)
🤖 Prompt for AI Agents
In pysqa/base/remote.py around lines 96 to 101, the current known_hosts handling
treats an empty string as a valid path and implicitly enables AutoAddPolicy
downstream; change it to treat missing or empty known_hosts as None, add a new
boolean config key ssh_auto_add_unknown_hosts defaulting to False, and only
enable auto-accept when that flag is true. Specifically: read
config.get("known_hosts") and if value is falsy (None or empty string) set
self._ssh_known_hosts = None; if a non-empty path is provided expand and abspath
it and then check os.path.exists(path) — if it does not exist set
self._ssh_known_hosts = None and emit a warning via the module logger; add
self._ssh_auto_add_unknown_hosts = bool(config.get("ssh_auto_add_unknown_hosts",
False)) so callers can decide whether to set AutoAddPolicy; ensure default
behavior prefers loading system host keys when known_hosts is None and only use
AutoAddPolicy when the new flag is true.

Comment on lines 352 to 355
if self._ssh_known_hosts is not None:
ssh.load_host_keys(self._ssh_known_hosts)
else:
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid silently trusting unknown hosts; load system host keys and gate AutoAddPolicy behind explicit opt-in

When _ssh_known_hosts is None, switching to AutoAddPolicy unconditionally trusts unknown hosts, enabling MITM risk. Prefer:

  • Always load system host keys.
  • Optionally load a user-provided known_hosts file.
  • Use RejectPolicy by default; only AutoAdd when explicitly configured.

Apply this diff:

-        if self._ssh_known_hosts is not None:
-            ssh.load_host_keys(self._ssh_known_hosts)
-        else:
-            ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+        # Load system-level keys first; optionally supplement with a custom file.
+        ssh.load_system_host_keys()
+        if self._ssh_known_hosts is not None:
+            try:
+                ssh.load_host_keys(self._ssh_known_hosts)
+            except FileNotFoundError:
+                warnings.warn(
+                    f"known_hosts file not found at '{self._ssh_known_hosts}'. Using system host keys only.",
+                    stacklevel=2,
+                )
+        # Decide how to handle unknown hosts
+        if getattr(self, "_ssh_auto_add_unknown_hosts", False):
+            ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
+        else:
+            ssh.set_missing_host_key_policy(paramiko.RejectPolicy())

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pysqa/base/remote.py around lines 352-355, the code currently sets
AutoAddPolicy when self._ssh_known_hosts is None which silently trusts unknown
hosts; change it to always call ssh.load_system_host_keys(), then if
self._ssh_known_hosts is set load that file too, and set the missing host key
policy to paramiko.RejectPolicy() by default; introduce or use an explicit
boolean config flag (e.g. self._allow_auto_add_hosts defaulting to False) so
AutoAddPolicy() is only applied when that flag is True (otherwise use
RejectPolicy()).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/test_remote_auth.py (1)

22-22: Switching to host-based config looks correct; consider adding coverage for the "no known_hosts" path.

The updated QueueAdapter(directory=.../config/remote_rebex_hosts) is consistent with making known_hosts optional. To fully exercise the new behavior, add a test that sets _ssh_known_hosts to "" and asserts AutoAddPolicy is applied (without making a real network call).

Add this test method to TestRemoteQueueAdapterAuth:

def test_known_hosts_optional_auto_add_policy(self):
    # Ensures that when _ssh_known_hosts is empty, AutoAddPolicy is set
    from unittest.mock import patch

    class FakeSSHClient:
        def __init__(self, *args, **kwargs):
            self.auto_add_policy_set = False
            self.host_keys_loaded = False

        def load_host_keys(self, *args, **kwargs):
            self.host_keys_loaded = True

        def set_missing_host_key_policy(self, policy):
            # We don't depend on instance type here; just record that it was set
            self.auto_add_policy_set = True

        def connect(self, *args, **kwargs):
            # No-op to avoid real network calls
            pass

    path = os.path.dirname(os.path.abspath(__file__))
    remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))

    # Force the "no known_hosts" branch and a password-based auth path
    remote._adapter._ssh_known_hosts = ""
    remote._adapter._ssh_key = None
    remote._adapter._ssh_ask_for_password = False
    remote._adapter._ssh_password = remote._adapter._ssh_password or "dummy"

    with patch("pysqa.base.remote.paramiko.SSHClient", FakeSSHClient):
        ssh = remote._adapter._open_ssh_connection()
        # Validate behavior
        self.assertTrue(ssh.auto_add_policy_set)
        self.assertFalse(ssh.host_keys_loaded)

Also applies to: 29-29, 36-36

tests/test_remote.py (2)

162-170: Nit: fix typo in test name and ensure the persistent connection is closed.

  • Typo: “continous” → “continuous” in the test name.
  • Since you explicitly open a persistent connection, close it in a finally block to avoid leaking connections in CI.

Apply this diff:

-def test_remote_command_continous_connection_hosts(self):
+def test_remote_command_continuous_connection_hosts(self):
     path = os.path.dirname(os.path.abspath(__file__))
     remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))
     remote._adapter._ssh_remote_path = path
     remote._adapter._ssh_continous_connection = True
-    remote._adapter._open_ssh_connection()
-    output = remote._adapter._execute_remote_command(command="pwd")
-    self.assertEqual(output, "/\n")
+    ssh = remote._adapter._open_ssh_connection()
+    try:
+        output = remote._adapter._execute_remote_command(command="pwd")
+        self.assertEqual(output, "/\n")
+    finally:
+        if ssh is not None:
+            ssh.close()
+            remote._adapter._ssh_connection = None

Note: The attribute name _ssh_continous_connection is misspelled in production code; keep it as-is here to avoid breaking behavior. If you plan to fix the spelling in code later, deprecate with a compatibility shim.


173-173: LGTM on switching remaining tests to host-based config; consider DRYing repeated path construction.

All these tests now initialize via config/remote_rebex_hosts, which aligns with the new known_hosts handling. To reduce duplication and improve maintainability, compute the hosts config path once in setUpClass or setUp of TestRemoteQueueAdapterRebex and reuse it.

Example refactor (outside this hunk):

@classmethod
def setUpClass(cls):
    cls.path = os.path.dirname(os.path.abspath(__file__))
    cls.hosts_dir = os.path.join(cls.path, "config/remote_rebex_hosts")

def _make_remote(self):
    return QueueAdapter(directory=self.hosts_dir)

# Usage in tests:
remote = self._make_remote()

Alternatively, parameterize over both config/remote_rebex and config/remote_rebex_hosts using subTest to collapse duplicated tests.

Also applies to: 180-180, 186-186, 193-193, 200-200

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41a42a0 and 8868a61.

📒 Files selected for processing (3)
  • tests/config/remote_rebex_hosts/queue.yaml (1 hunks)
  • tests/test_remote.py (1 hunks)
  • tests/test_remote_auth.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/config/remote_rebex_hosts/queue.yaml
🧰 Additional context used
🧬 Code Graph Analysis (2)
tests/test_remote_auth.py (2)
pysqa/queueadapter.py (1)
  • QueueAdapter (13-384)
pysqa/base/remote.py (1)
  • _open_ssh_connection (344-468)
tests/test_remote.py (2)
pysqa/queueadapter.py (2)
  • QueueAdapter (13-384)
  • submit_job (188-231)
pysqa/base/remote.py (5)
  • _open_ssh_connection (344-468)
  • _execute_remote_command (559-582)
  • submit_job (147-183)
  • transfer_file (266-291)
  • get_transport (680-684)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: unittest_matrix (windows-latest, 3.13)
  • GitHub Check: unittest_slurm
  • GitHub Check: unittest_matrix (macos-latest, 3.13)

Comment on lines +154 to +161
def test_remote_command_individual_connections_hosts(self):
path = os.path.dirname(os.path.abspath(__file__))
remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))
remote._adapter._ssh_remote_path = path
remote._adapter._open_ssh_connection()
output = remote._adapter._execute_remote_command(command="pwd")
self.assertEqual(output, "/\n")

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid opening a redundant SSH connection in the individual-connections host test.

When _ssh_continous_connection is False, _execute_remote_command opens and closes its own connection. Calling _open_ssh_connection beforehand (Line 158) creates an unused connection and may leak resources.

Apply this diff:

 def test_remote_command_individual_connections_hosts(self):
     path = os.path.dirname(os.path.abspath(__file__))
     remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))
     remote._adapter._ssh_remote_path = path
-    remote._adapter._open_ssh_connection()
     output = remote._adapter._execute_remote_command(command="pwd")
     self.assertEqual(output, "/\n")

Optionally, make the same change in test_remote_command_individual_connections above (Lines 141–143) for consistency.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_remote_command_individual_connections_hosts(self):
path = os.path.dirname(os.path.abspath(__file__))
remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))
remote._adapter._ssh_remote_path = path
remote._adapter._open_ssh_connection()
output = remote._adapter._execute_remote_command(command="pwd")
self.assertEqual(output, "/\n")
def test_remote_command_individual_connections_hosts(self):
path = os.path.dirname(os.path.abspath(__file__))
remote = QueueAdapter(directory=os.path.join(path, "config/remote_rebex_hosts"))
remote._adapter._ssh_remote_path = path
output = remote._adapter._execute_remote_command(command="pwd")
self.assertEqual(output, "/\n")
🤖 Prompt for AI Agents
In tests/test_remote.py around lines 154 to 161, the test calls
remote._adapter._open_ssh_connection() before executing the command which is
redundant when _ssh_continous_connection is False and causes a leaked unused
connection; remove the explicit call to _open_ssh_connection() (line 158) so
_execute_remote_command manages its own connect/disconnect, and also apply the
same removal to the similar test at lines ~141–143 for consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant