-
Notifications
You must be signed in to change notification settings - Fork 34
feat!: proposal for dynamic partition selection #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis update introduces an automatic SLURM partition selection feature for the executor plugin, enabling dynamic partition assignment based on job resource requirements and a user-supplied YAML configuration. The implementation includes new logic for partition scoring and selection, updates to the executor's control flow, and comprehensive tests validating partition parsing and matching. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Executor
participant PartitionsModule
participant SLURM
User->>Executor: Submit job with resource requirements
Executor->>PartitionsModule: Load partition config (if provided)
Executor->>PartitionsModule: get_best_partition(job)
PartitionsModule->>PartitionsModule: Score job fit for each partition
PartitionsModule-->>Executor: Return best partition (or None)
Executor->>SLURM: Submit job with selected partition (if any)
SLURM-->>Executor: Job scheduled
Executor-->>User: Job submission result
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (3)
snakemake_executor_plugin_slurm/partitions.py (3)
44-53
: Use appropriate logging levels.The function uses
logger.warning
for informational messages about partition selection. These are normal operational messages, not warnings.- logger.warning( + logger.info( f"Auto-selected partition '{partition}' for job {job.name} " f"with score {best_score:.3f}" ) return partition else: logger.warning( f"No suitable partition found for job {job.name} based on " f"resource requirements. Falling back to default behavior." )
248-250
: Simplify nested if statements.As suggested by static analysis, these nested if statements can be combined for better readability.
- if gpu_model and self.limits.available_gpu_models: - if gpu_model not in self.limits.available_gpu_models: - return None + if gpu_model and self.limits.available_gpu_models and gpu_model not in self.limits.available_gpu_models: + return None🧰 Tools
🪛 Ruff (0.11.9)
248-249: Use a single
if
statement instead of nestedif
statements(SIM102)
184-223
: Consider documenting and refining the scoring algorithm.The current scoring algorithm simply sums the ratios of requested resources to limits, giving equal weight to all resources. This may not reflect real-world partition selection priorities.
Consider:
- Adding weights to different resources based on their importance
- Documenting the scoring rationale in the method docstring
- Making the scoring configurable through the partition configuration
For example, memory and GPU availability might be more critical than other factors in partition selection.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 184-184: Too many return statements (8/6)
(R0911)
[refactor] 184-184: Too many branches (21/12)
(R0912)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/further.md
(1 hunks)snakemake_executor_plugin_slurm/__init__.py
(6 hunks)snakemake_executor_plugin_slurm/partitions.py
(1 hunks)snakemake_executor_plugin_slurm/submit_string.py
(0 hunks)tests/tests.py
(2 hunks)
💤 Files with no reviewable changes (1)
- snakemake_executor_plugin_slurm/submit_string.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
snakemake_executor_plugin_slurm/__init__.py (1)
snakemake_executor_plugin_slurm/partitions.py (2)
read_partition_file
(13-27)get_best_partition
(30-54)
tests/tests.py (4)
snakemake_executor_plugin_slurm/__init__.py (1)
ExecutorSettings
(36-119)snakemake_executor_plugin_slurm/utils.py (1)
set_gres_string
(51-123)snakemake_executor_plugin_slurm/submit_string.py (1)
get_submit_command
(5-69)snakemake_executor_plugin_slurm/partitions.py (2)
read_partition_file
(13-27)get_best_partition
(30-54)
🪛 Pylint (3.3.7)
tests/tests.py
[refactor] 485-485: Too many public methods (28/20)
(R0904)
snakemake_executor_plugin_slurm/partitions.py
[refactor] 41-54: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 78-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 115-144: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 93-93: Too many branches (14/12)
(R0912)
[refactor] 150-150: Too many instance attributes (13/7)
(R0902)
[refactor] 184-184: Too many return statements (8/6)
(R0911)
[refactor] 184-184: Too many branches (21/12)
(R0912)
🪛 Ruff (0.11.9)
snakemake_executor_plugin_slurm/partitions.py
248-249: Use a single if
statement instead of nested if
statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: testing
🔇 Additional comments (4)
docs/further.md (1)
67-139
: Well-documented feature with clear examples!The documentation for the automatic partition selection feature is comprehensive and well-structured. The table of configurable limits, example YAML configuration, and explanation of the scoring algorithm provide users with all the necessary information to effectively use this feature.
tests/tests.py (1)
485-974
: Excellent test coverage for partition selection feature!The test suite comprehensively covers all aspects of the partition selection feature:
- Various partition configuration scenarios (minimal, comprehensive, invalid)
- Edge cases (empty partitions, missing keys, invalid YAML)
- Different job resource requirements (CPU, GPU, MPI, constraints)
- Proper error handling and logging verification
The use of pytest fixtures and parametrization makes the tests maintainable and easy to extend.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 485-485: Too many public methods (28/20)
(R0904)
snakemake_executor_plugin_slurm/__init__.py (1)
639-652
: Well-implemented partition selection priority logic!The enhanced
get_partition_arg
method correctly implements the partition selection priority:
- Explicit partition specification (via
slurm_partition
resource)- Automatic selection based on loaded partition config
- Fallback to default partition behavior
This maintains backward compatibility while adding the new automatic selection feature.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 649-652: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
snakemake_executor_plugin_slurm/partitions.py (1)
149-175
: Well-designed dataclass for partition limits.The
PartitionLimits
dataclass appropriately models SLURM partition resource constraints with sensible defaults (infinity for unlimited, 0 for GPUs).🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 150-150: Too many instance attributes (13/7)
(R0902)
partition_config: Optional[Path] = field( | ||
default=None, | ||
metadata={ | ||
"help": "Path to YAML file defining partition limits for automatic partition selection. " | ||
"When provided, jobs will be automatically assigned to the best-fitting partition based on " | ||
"See documentation for complete list of available limits.", | ||
"env_var": False, | ||
"required": False, | ||
}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix incomplete help text for partition_config.
The help text appears to be incomplete. It currently reads:
"When provided, jobs will be automatically assigned to the best-fitting partition based on "
This sentence is missing its conclusion.
- "help": "Path to YAML file defining partition limits for automatic partition selection. "
- "When provided, jobs will be automatically assigned to the best-fitting partition based on "
- "See documentation for complete list of available limits.",
+ "help": "Path to YAML file defining partition limits for automatic partition selection. "
+ "When provided, jobs will be automatically assigned to the best-fitting partition based on "
+ "their resource requirements. See documentation for complete list of available limits.",
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
partition_config: Optional[Path] = field( | |
default=None, | |
metadata={ | |
"help": "Path to YAML file defining partition limits for automatic partition selection. " | |
"When provided, jobs will be automatically assigned to the best-fitting partition based on " | |
"See documentation for complete list of available limits.", | |
"env_var": False, | |
"required": False, | |
}, | |
) | |
partition_config: Optional[Path] = field( | |
default=None, | |
metadata={ | |
"help": "Path to YAML file defining partition limits for automatic partition selection. " | |
"When provided, jobs will be automatically assigned to the best-fitting partition based on " | |
"their resource requirements. See documentation for complete list of available limits.", | |
"env_var": False, | |
"required": False, | |
}, | |
) |
🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/__init__.py around lines 110 to 119, the help
text for the partition_config field is incomplete and ends abruptly. Complete
the sentence to clearly explain how jobs are assigned to partitions when this
config is provided, ensuring the help message fully describes the behavior and
usage of the partition limits YAML file.
if cpus_per_task < 0: | ||
return (0, "none") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider raising an error for negative CPU values.
When cpus_per_task
is negative, the function returns (0, "none")
. This silently ignores an invalid configuration that likely indicates a user error.
if cpus_per_task < 0:
- return (0, "none")
+ raise WorkflowError(f"Invalid negative value for cpus_per_task: {cpus_per_task}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if cpus_per_task < 0: | |
return (0, "none") | |
if cpus_per_task < 0: | |
raise WorkflowError(f"Invalid negative value for cpus_per_task: {cpus_per_task}") |
🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/partitions.py around lines 125 to 126, the
function currently returns (0, "none") when cpus_per_task is negative, which
silently ignores invalid input. Instead, raise a ValueError with a clear message
indicating that cpus_per_task cannot be negative to properly signal the
configuration error.
# Parse gres string format: gpu:<number> or gpu:<model>:<number> | ||
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")] | ||
if gpu_parts: | ||
gpu_spec = gpu_parts[0].strip().split(":") | ||
if len(gpu_spec) == 2: # gpu:<number> | ||
return int(gpu_spec[1]), None | ||
elif len(gpu_spec) == 3: # gpu:<model>:<number> | ||
return int(gpu_spec[2]), gpu_spec[1] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for gres GPU parsing.
The gres parsing logic has potential issues:
- Integer conversion could fail if the number part isn't numeric
- Only the first GPU specification is processed from comma-separated values
- No validation for unexpected formats
# Parse gres string format: gpu:<number> or gpu:<model>:<number>
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")]
if gpu_parts:
gpu_spec = gpu_parts[0].strip().split(":")
if len(gpu_spec) == 2: # gpu:<number>
- return int(gpu_spec[1]), None
+ try:
+ return int(gpu_spec[1]), None
+ except ValueError:
+ raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[1]}")
elif len(gpu_spec) == 3: # gpu:<model>:<number>
- return int(gpu_spec[2]), gpu_spec[1]
+ try:
+ return int(gpu_spec[2]), gpu_spec[1]
+ except ValueError:
+ raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[2]}")
+ else:
+ raise WorkflowError(f"Invalid gres GPU format: {gpu_parts[0]}")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Parse gres string format: gpu:<number> or gpu:<model>:<number> | |
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")] | |
if gpu_parts: | |
gpu_spec = gpu_parts[0].strip().split(":") | |
if len(gpu_spec) == 2: # gpu:<number> | |
return int(gpu_spec[1]), None | |
elif len(gpu_spec) == 3: # gpu:<model>:<number> | |
return int(gpu_spec[2]), gpu_spec[1] | |
# Parse gres string format: gpu:<number> or gpu:<model>:<number> | |
gpu_parts = [part for part in gres.split(",") if part.strip().startswith("gpu")] | |
if gpu_parts: | |
gpu_spec = gpu_parts[0].strip().split(":") | |
if len(gpu_spec) == 2: # gpu:<number> | |
try: | |
return int(gpu_spec[1]), None | |
except ValueError: | |
raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[1]}") | |
elif len(gpu_spec) == 3: # gpu:<model>:<number> | |
try: | |
return int(gpu_spec[2]), gpu_spec[1] | |
except ValueError: | |
raise WorkflowError(f"Invalid GPU count in gres: {gpu_spec[2]}") | |
else: | |
raise WorkflowError(f"Invalid gres GPU format: {gpu_parts[0]}") |
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/partitions.py around lines 81 to 89, the
current GPU gres parsing lacks error handling for non-numeric values during
integer conversion, processes only the first GPU spec ignoring others, and does
not validate unexpected formats. To fix this, add try-except blocks around
integer conversions to catch and handle ValueError, iterate over all GPU specs
in the comma-separated list instead of just the first, and include validation
logic to handle or report unexpected formats gracefully.
def read_partition_file(partition_file: Path) -> List["Partition"]: | ||
with open(partition_file, "r") as f: | ||
out = [] | ||
partitions_dict = yaml.safe_load(f)["partitions"] | ||
for partition_name, partition_config in partitions_dict.items(): | ||
if not partition_name or not partition_name.strip(): | ||
raise KeyError("Partition name cannot be empty") | ||
|
||
out.append( | ||
Partition( | ||
name=partition_name, | ||
limits=PartitionLimits(**partition_config), | ||
) | ||
) | ||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling for file operations and YAML parsing.
The function lacks error handling for several potential failures:
- File not found or permission errors
- Invalid YAML syntax
- Missing "partitions" key
- Invalid partition configuration structure
Consider adding comprehensive error handling:
def read_partition_file(partition_file: Path) -> List["Partition"]:
- with open(partition_file, "r") as f:
- out = []
- partitions_dict = yaml.safe_load(f)["partitions"]
+ try:
+ with open(partition_file, "r") as f:
+ config = yaml.safe_load(f)
+ except FileNotFoundError:
+ raise WorkflowError(f"Partition configuration file not found: {partition_file}")
+ except yaml.YAMLError as e:
+ raise WorkflowError(f"Invalid YAML in partition configuration: {e}")
+ except Exception as e:
+ raise WorkflowError(f"Error reading partition configuration: {e}")
+
+ if not isinstance(config, dict) or "partitions" not in config:
+ raise WorkflowError("Partition configuration must contain a 'partitions' key")
+
+ partitions_dict = config["partitions"]
+ if not isinstance(partitions_dict, dict):
+ raise WorkflowError("'partitions' must be a dictionary")
+
+ out = []
for partition_name, partition_config in partitions_dict.items():
if not partition_name or not partition_name.strip():
raise KeyError("Partition name cannot be empty")
-
+
+ if not isinstance(partition_config, dict):
+ raise WorkflowError(f"Configuration for partition '{partition_name}' must be a dictionary")
+
out.append(
Partition(
name=partition_name,
limits=PartitionLimits(**partition_config),
)
)
return out
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def read_partition_file(partition_file: Path) -> List["Partition"]: | |
with open(partition_file, "r") as f: | |
out = [] | |
partitions_dict = yaml.safe_load(f)["partitions"] | |
for partition_name, partition_config in partitions_dict.items(): | |
if not partition_name or not partition_name.strip(): | |
raise KeyError("Partition name cannot be empty") | |
out.append( | |
Partition( | |
name=partition_name, | |
limits=PartitionLimits(**partition_config), | |
) | |
) | |
return out | |
def read_partition_file(partition_file: Path) -> List["Partition"]: | |
try: | |
with open(partition_file, "r") as f: | |
config = yaml.safe_load(f) | |
except FileNotFoundError: | |
raise WorkflowError(f"Partition configuration file not found: {partition_file}") | |
except yaml.YAMLError as e: | |
raise WorkflowError(f"Invalid YAML in partition configuration: {e}") | |
except Exception as e: | |
raise WorkflowError(f"Error reading partition configuration: {e}") | |
if not isinstance(config, dict) or "partitions" not in config: | |
raise WorkflowError("Partition configuration must contain a 'partitions' key") | |
partitions_dict = config["partitions"] | |
if not isinstance(partitions_dict, dict): | |
raise WorkflowError("'partitions' must be a dictionary") | |
out = [] | |
for partition_name, partition_config in partitions_dict.items(): | |
if not partition_name or not partition_name.strip(): | |
raise KeyError("Partition name cannot be empty") | |
if not isinstance(partition_config, dict): | |
raise WorkflowError( | |
f"Configuration for partition '{partition_name}' must be a dictionary" | |
) | |
out.append( | |
Partition( | |
name=partition_name, | |
limits=PartitionLimits(**partition_config), | |
) | |
) | |
return out |
🤖 Prompt for AI Agents
In snakemake_executor_plugin_slurm/partitions.py around lines 13 to 27, the
read_partition_file function lacks error handling for file I/O and YAML parsing
issues. Add try-except blocks to catch and handle exceptions such as
FileNotFoundError, PermissionError when opening the file, yaml.YAMLError for
invalid YAML syntax, KeyError for missing "partitions" key, and TypeError or
ValueError for invalid partition configuration structure. Provide clear error
messages or re-raise exceptions with context to improve robustness and
debuggability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
snakemake_executor_plugin_slurm/partitions.py (3)
13-27
: Add error handling for file operations and YAML parsing.The function lacks error handling for several potential failures as previously noted.
81-89
: Add error handling for gres GPU parsing.The gres parsing logic needs error handling for integer conversion failures and format validation as previously noted.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
125-126
: Consider raising an error for negative CPU values.When
cpus_per_task
is negative, raising an error would better signal the configuration issue.
🧹 Nitpick comments (2)
snakemake_executor_plugin_slurm/partitions.py (2)
248-250
: Simplify nested if statements.Combine the nested conditions for better readability.
- if gpu_model and self.limits.available_gpu_models: - if gpu_model not in self.limits.available_gpu_models: - return None + if gpu_model and self.limits.available_gpu_models and gpu_model not in self.limits.available_gpu_models: + return None🧰 Tools
🪛 Ruff (0.11.9)
248-249: Use a single
if
statement instead of nestedif
statements(SIM102)
190-223
: The resource ratio scoring approach promotes efficient partition utilization.The scoring algorithm effectively:
- Favors partitions that closely match job requirements
- Prevents small jobs from monopolizing large partitions
- Handles unlimited resources gracefully with
isinf
checksThis should lead to better overall cluster utilization.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
snakemake_executor_plugin_slurm/__init__.py
(7 hunks)snakemake_executor_plugin_slurm/partitions.py
(1 hunks)tests/tests.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- snakemake_executor_plugin_slurm/init.py
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/tests.py
[refactor] 486-486: Too many public methods (28/20)
(R0904)
snakemake_executor_plugin_slurm/partitions.py
[refactor] 41-54: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 78-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 85-88: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 115-144: Unnecessary "elif" after "return", remove the leading "el" from "elif"
(R1705)
[refactor] 93-93: Too many branches (14/12)
(R0912)
[refactor] 150-150: Too many instance attributes (13/7)
(R0902)
[refactor] 184-184: Too many return statements (8/6)
(R0911)
[refactor] 184-184: Too many branches (21/12)
(R0912)
🪛 Ruff (0.11.9)
snakemake_executor_plugin_slurm/partitions.py
248-249: Use a single if
statement instead of nested if
statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: testing
🔇 Additional comments (4)
tests/tests.py (2)
6-9
: LGTM! Appropriate imports for partition testing functionality.The new imports support the comprehensive test suite for the partition selection feature.
Also applies to: 13-16
486-976
: Excellent test coverage for the partition selection feature!The test suite comprehensively covers:
- Various partition configurations (basic, minimal, comprehensive)
- Different job types (CPU-only, GPU, MPI)
- Edge cases and error conditions
- Partition scoring and selection logic
The tests ensure robust validation of the new automatic partition selection functionality.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 486-486: Too many public methods (28/20)
(R0904)
snakemake_executor_plugin_slurm/partitions.py (2)
30-55
: Well-implemented partition selection logic!The function correctly:
- Filters out incompatible partitions
- Selects the best-scoring partition
- Provides informative logging for both success and failure cases
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 41-54: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
149-175
: Well-structured partition limits dataclass!The dataclass appropriately:
- Uses
inf
for unlimited numerical resources- Defaults
max_gpu
to 0 for non-GPU partitions- Organizes resources by type (standard, SLURM-specific, GPU, MPI, constraints)
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 150-150: Too many instance attributes (13/7)
(R0902)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/tests.py (1)
547-550
: Improve test case for missing required fields.The current test uses an empty partition name
""
but doesn't truly test missing required fields. Consider testing with a partition that completely lacks the name field.@pytest.fixture def missing_name_config(self): """Configuration with missing name field.""" - return {"partitions": {"": {}}} # Empty partition name + return {"partitions": {None: {}}} # None as key to test missing name handling
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/tests.py
(2 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
tests/tests.py
[refactor] 486-486: Too many public methods (28/20)
(R0904)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: testing
🔇 Additional comments (6)
tests/tests.py (6)
6-8
: LGTM! Necessary imports for partition testing.The added imports for
tempfile
,yaml
, andPath
are appropriate for creating temporary configuration files and testing the YAML parsing functionality.
13-16
: LGTM! Proper imports for partition functionality.The imports for
read_partition_file
andget_best_partition
are correctly importing the new partition selection functions being tested.
486-486
: Comprehensive test coverage is appropriate despite static analysis warning.The pylint warning about too many public methods (28/20) is a false positive for test classes. Comprehensive test suites naturally require many test methods to cover all scenarios, and this level of coverage is beneficial for the new partition selection feature.
🧰 Tools
🪛 Pylint (3.3.7)
[refactor] 486-486: Too many public methods (28/20)
(R0904)
557-567
: LGTM! Well-designed helper fixture for temporary files.The
temp_yaml_file
fixture provides a clean way to create temporary YAML files for testing. The implementation properly handles file creation and returns the path for cleanup.
569-709
: Excellent coverage of file reading scenarios.The tests comprehensively cover various file reading scenarios including:
- Valid partition configurations with different complexity levels
- Invalid YAML syntax
- Missing files and required keys
- Proper cleanup of temporary files
The error handling validation ensures the partition parsing is robust.
734-970
: Comprehensive partition selection test coverage.The partition selection tests effectively validate the core functionality including:
- CPU and GPU job matching
- MPI support validation
- Constraint compatibility checking
- Resource limit enforcement
- Edge cases and error scenarios
The tests properly verify both successful partition selection and appropriate handling when no suitable partition exists.
@cademirch Hoi Cade, this is most impressive and looks like a major step forward! Alas, I will be on a conference for the next week and this PR really needs to be thoroughly tested. So, please allow me a bit more time than usual. On the plus side: I will add a '!' to the title, as this might be worth a major release. |
Of course, I implemented various tests (albeit with LLM assistance, though I looked them over carefully and they all mostly make sense). I expect there are likely use-cases and SLURM configurations I've overlooked - on which your feedback will be very helpful. Enjoy your conference! |
@cmeesters Gentle bump here, would be great to discuss this. |
@cademirch I'm sorry - I forgot to notify you: I didn't have a cluster for the last week (cooling issues). Hopefully, by next week, we open again. Yet, I do not know how much time I will find: holidays are approaching. I really want to test this. It would be a wonderful addition! update: still no working cluster for me. As I will be on holiday until mid-July, there will be little progress, I'm afraid. I want to attempt a first test, once the cluster is up again, but whether this leave me time to do something substantial, I do not know. |
Based on some of the work in https://github.com/harvardinformatics/snakemake-executor-plugin-cannon, I've put together this PR to add dynamic partition selection based on job resource requirements. Relevant discussion in #106
This works by having the user provide a YML file that specifies their cluster's partitions and the resource limits of each via option
--slurm-partition-config
. The expected structure of the file is simple:I realize adding another config file isn't ideal given that workflow specific configs, workflow profiles, and global profiles already exist. But this approach avoids the complexity of having to determine cluster configurations through SLURM commands, which can vary wildly (as discussed in #106). It also sidesteps the need for complex expressions in set-resources to determine partitions, which can get unwieldy for big workflows with lots of rules. Ideally, users can craft a
partitions.yml
once and set it in their global profile.I'm not super familiar with SLURM, and partition resource limits, so I came up with a list based on the Snakemake standard resources and SLURM resources:
The following limits can be defined for each partition:
max_runtime
max_mem_mb
max_mem_mb_per_cpu
max_cpus_per_task
max_nodes
max_tasks
max_tasks_per_node
max_gpu
available_gpu_models
max_cpus_per_gpu
supports_mpi
max_mpi_tasks
available_constraints
It could be possible to support any arbitrary resource though, by pattern matching: "max_{resource}". Though I've not implemented this in this PR.
To pick the "best" partition for a job, I went with a naive scoring approach that calculates a score by summing the ratios of requested resources to partition limits. Higher scores should indicate better resource utilization, for example: a job requesting 8 CPUs would prefer a 16-CPU partition (score 0.5) over a 64-CPU partition (score 0.125). Partitions that cannot satisfy a job's requirements are not considered.
This feature is opt in and respects the
slurm_partition
job resource, as well as existing fallback partition logic.@cmeesters, @johanneskoester let me know what you think of this approach! I'm not particularly experienced with SLURM, so I've made decisions here (limits, partition specs, etc.) based on my limited experience and the available docs - so feedback is much appreciated.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Documentation
Tests