Updates in the threads design of the tool because of random fails in HPC by ypriverol · Pull Request #60 · bigbio/quantms-rescoring

ypriverol · 2026-01-30T13:18:44Z

Summary by CodeRabbit

Release Notes

New Features
- Added GPU disabling capability for CPU-only environments to prevent initialization issues on HPC clusters.
Bug Fixes
- Improved error logging with exception traces for better debugging.
- Added 1-hour pool timeout to prevent process hangs.
- Enhanced worker process initialization to reduce log spam.
Documentation
- Comprehensive HPC/Slurm stability features guide added.
- Updated thread management documentation with new GPU disabling behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-30T13:19:09Z

📝 Walkthrough

Walkthrough

This pull request adds GPU disabling support for HPC/CPU-only environments and introduces robust multiprocessing pool management with worker initialization, timeout handling, and improved logging across the quantmsrescore module. Core changes include a new disable_gpu parameter in threading configuration, worker process setup utilities, and enhanced error handling.

Changes

Cohort / File(s)	Summary
Configuration & Worker Setup `quantmsrescore/__init__.py`, `quantmsrescore/logging_config.py`	Added `disable_gpu` parameter to `configure_threading()` to set `CUDA_VISIBLE_DEVICES=""` before imports; new `configure_worker_process()` function suppresses warnings and configures environment variables and thread limits for multiprocessing workers.
Multiprocessing Pool Management `quantmsrescore/alphapeptdeep.py`, `quantmsrescore/ms2pip.py`, `quantmsrescore/deeplc.py`	Introduces robust pool retrieval with context creation, spawn mode, and fallback to dummy pools; adds 3600-second timeout on pool result retrieval with error logging and termination; initializes pools with `configure_worker_process` to reduce log spam.
Error Logging & Exception Handling `quantmsrescore/annotator.py`	Enhanced exception logging with `exc_info=True` for MS2PIP and AlphaPeptDeep failures; narrowed exception handlers in search parameter updates to specific exception types with debug logging.
GPU Disabling Integration `quantmsrescore/ms2rescore.py`, `quantmsrescore/transfer_learning.py`	Passes `disable_gpu=True` to `configure_threading()` calls to prevent CUDA initialization on CPU-only nodes.
Documentation & Metadata `README.md`, `.gitignore`	Added comprehensive "HPC Stability Features" section documenting GPU disabling, pool timeouts, and worker initialization; updated threading documentation to reflect HPC mode behavior; added `.cursor/rules/codacy.mdc` to gitignore.

Sequence Diagram(s)

sequenceDiagram
    participant App as Application
    participant Pool as Pool Manager
    participant Worker as Worker Process
    participant Config as configure_worker_process()
    participant Result as Result Handler

    App->>Pool: Initialize pool with initializer
    Pool->>Worker: Spawn worker process
    Worker->>Config: Call initializer function
    Config->>Config: Suppress warnings
    Config->>Config: Set CUDA_VISIBLE_DEVICES=""
    Config->>Config: Configure thread limits
    Config-->>Worker: Initialization complete
    App->>Pool: Submit async task (apply_async)
    Pool->>Worker: Execute task
    Worker-->>Result: Return result (with timeout)
    Result->>Result: Check timeout (3600s)
    alt Timeout Error
        Result->>Pool: Terminate pool
        Result-->>App: Re-raise TimeoutError
    else Success
        Result-->>App: Return result
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

Review effort 4/5

Suggested reviewers

daichengxin

Poem

🐰 A pool of workers, carefully tamed,
No CUDA ghosts on HPC's domain,
With timeouts set and warnings suppressed,
Our spawned processes pass the test!
Stability blooms in the multiprocess nest. ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main objective of this PR: addressing HPC environment issues through threading and GPU management updates. The changes comprehensively implement GPU disabling, worker process initialization, pool timeout safeguards, and documentation updates specifically for HPC/Slurm stability.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@quantmsrescore/deeplc.py`:
- Line 110: The ProForma charge-splitting uses a backslash; update both
occurrences that call psm.peptidoform.proforma.split("\\")[0] to split on a
forward slash instead (split("/")[0]) so peptide keys are
charge-agnostic—replace the usage at the assignment to peptide (the line with
peptide = psm.peptidoform.proforma.split...) and the lookup/assignment into
peptide_rt_diff_dict that references psm.peptidoform.proforma.split("\\")[0].

🧹 Nitpick comments (6)

.gitignore (1)
167-170: LGTM!

The addition of the Cursor AI rules directory to .gitignore is appropriate for excluding tooling artifacts from version control.

Optional: Consider adding a space after # in the comment for consistency with Python comment conventions.
✨ Optional formatting improvement
-#Ignore cursor AI rules
+# Ignore cursor AI rules
 .cursor/rules/codacy.mdc
quantmsrescore/logging_config.py (1)
200-200: Consider moving import os to module level.

The os module is already imported at module level in quantmsrescore/__init__.py and other files. Moving this import to the top of the file would be more consistent with the codebase style and slightly more efficient for repeated calls.
Suggested change

Add at the top of the file with other imports:
import os
Then remove line 200.
quantmsrescore/annotator.py (1)
408-409: Use f-string conversion flag instead of explicit str() call.

Per static analysis (Ruff RUF010), prefer the !s conversion flag for cleaner syntax.
Suggested fix
         logger.info(
-            f"Successfully applied AlphaPeptDeep annotation using model: {str(alphapeptdeep_generator._peptdeep_model)}")
+            f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model!s}")
quantmsrescore/ms2pip.py (1)
183-191: Consider using logger.exception for richer error context.

The static analysis tool suggests using logging.exception instead of logging.error (Ruff TRY400). While for TimeoutError the traceback isn't particularly informative, using logger.exception is idiomatic in except blocks and provides consistent logging behavior.

Note: The pool.close() and pool.join() calls at lines 190-191 won't execute after the re-raise, but this is acceptable since pool.terminate() already handles worker cleanup.
♻️ Optional: Use logger.exception for consistency
             try:
                 results = [r.get(timeout=_POOL_GET_TIMEOUT) for r in mp_results]
             except multiprocessing.TimeoutError:
-                logger.error(f"Pool operation timed out after {_POOL_GET_TIMEOUT} seconds")
+                logger.exception(f"Pool operation timed out after {_POOL_GET_TIMEOUT} seconds")
                 pool.terminate()
                 raise
quantmsrescore/alphapeptdeep.py (2)
218-245: Code duplication with ms2pip.py.

This _get_pool method is nearly identical to PatchParallelized._get_pool() in quantmsrescore/ms2pip.py (lines 200-224). Consider extracting this to a shared utility function (e.g., in quantmsrescore/logging_config.py alongside configure_worker_process, or a new multiprocessing_utils.py module) to reduce maintenance burden and ensure consistent behavior.
♻️ Suggested shared utility approach

Create a shared utility function that both classes can use:
# In quantmsrescore/multiprocessing_utils.py or logging_config.py
def get_multiprocessing_pool(processes: int, initializer=None, logger=None):
    """Get multiprocessing pool with recursion/daemon protection."""
    if logger:
        logger.debug(f"Starting workers (processes={processes})...")

    if multiprocessing.current_process().daemon:
        if logger:
            logger.warning(
                "Running in a daemon process. Disabling multiprocessing as daemonic "
                "processes cannot have children."
            )
        return multiprocessing.dummy.Pool(1)

    if processes == 1:
        if logger:
            logger.debug("Using dummy multiprocessing pool.")
        return multiprocessing.dummy.Pool(1)

    if multiprocessing.parent_process() is not None:
        if logger:
            logger.warning(
                "Attempting to create a pool inside a worker process! "
                "Returning a dummy pool instead."
            )
        return multiprocessing.dummy.Pool(1)

    return multiprocessing.get_context("spawn").Pool(
        processes,
        initializer=initializer or configure_worker_process
    )
990-1049: Code duplication: read_spectrum_file function.

This function is identical to read_spectrum_file in quantmsrescore/ms2pip.py (lines 560-619). Consider extracting it to a shared module (e.g., quantmsrescore/spectrum_utils.py or within quantmsrescore/openms.py) to eliminate duplication and ensure consistent behavior across both feature generators.

coderabbitai · 2026-01-30T13:24:06Z

quantmsrescore/deeplc.py

+                                    "rt_diff": rt_diffs_run[i],
+                                }
+                            )
+                            peptide = psm.peptidoform.proforma.split("\\")[0]  # remove charge


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for peptidoform and proforma parsing patterns rg -n "peptidoform.*split" --type py rg -n "proforma.*split" --type py

Repository: bigbio/quantms-rescoring

Length of output: 561

🏁 Script executed:

# Check the broader context around line 110 in deeplc.py fd "deeplc.py" -x cat -n {}

Repository: bigbio/quantms-rescoring

Length of output: 6653

🏁 Script executed:

# Search for any peptidoform or proforma usage patterns rg -n "peptidoform" --type py -A 2 -B 2

Repository: bigbio/quantms-rescoring

Length of output: 10247

🏁 Script executed:

# Search for test files and example data fd -e py -path "*/test*" | head -20 rg -n "proforma" --type py -B 3 -A 3

Repository: bigbio/quantms-rescoring

Length of output: 1536

🏁 Script executed:

# Search for any example PSM data or format specifications fd -e json -o -e csv -o -e txt | xargs grep -l "peptidoform\|proforma" 2>/dev/null | head -10

Repository: bigbio/quantms-rescoring

Length of output: 184

🏁 Script executed:

# Look for PSM object definition or peptidoform usage documentation rg -n "class.*PSM\|class.*Peptidoform\|class.*peptidoform" --type py

Repository: bigbio/quantms-rescoring

Length of output: 50

🏁 Script executed:

# Search for any evidence of the actual proforma string format rg -n "\.proforma\s*=" --type py -A 1 -B 1

Repository: bigbio/quantms-rescoring

Length of output: 50

🏁 Script executed:

# Find test files and fixtures find . -type f -name "*test*.py" | head -20

Repository: bigbio/quantms-rescoring

Length of output: 92

🏁 Script executed:

# Check dependencies cat requirements.txt setup.py pyproject.toml 2>/dev/null | grep -i psm

Repository: bigbio/quantms-rescoring

Length of output: 93

🏁 Script executed:

# Look for any pytest fixtures or test data showing proforma format rg -n "proforma.*=" --type py -A 2

Repository: bigbio/quantms-rescoring

Length of output: 50

🏁 Script executed:

# Search for logging or print statements that might show proforma values rg -n "logger.*proforma\|print.*proforma\|f.*proforma" --type py

Repository: bigbio/quantms-rescoring

Length of output: 50

🏁 Script executed:

# Check if there's any documentation or README mentioning format rg -n "proforma\|peptidoform" README.md CONTRIBUTING.md docs/ 2>/dev/null -A 2

Repository: bigbio/quantms-rescoring

Length of output: 50

🏁 Script executed:

# Check test file for PSM examples cat -n ./tests/test_commands.py | head -100

Repository: bigbio/quantms-rescoring

Length of output: 3637

🏁 Script executed:

# Search for any example input files or data find . -type f \( -name "*.mzml" -o -name "*.mgf" -o -name "*.idxml" -o -name "*.pepxml" \) 2>/dev/null | head -5

Repository: bigbio/quantms-rescoring

Length of output: 50

🏁 Script executed:

# Look for any JSON or TSV test data find ./tests -type f \( -name "*.json" -o -name "*.tsv" -o -name "*.csv" \) 2>/dev/null

Repository: bigbio/quantms-rescoring

Length of output: 179

🌐 Web query:

ProForma notation standard charge separation format

💡 Result:

In HUPO‑PSI ProForma (v2.x), the precursor charge state is written by appending a slash and an integer at the very end of the ProForma string:

Format: .../z where z is the charge (e.g., 2 for 2+)

Example: VAEINPSNGGTT/2 (sequence with charge 2+) [1]

This is the common “charge separation” convention used by ProForma parsers/tools (e.g., AC[UNIMOD:4]DEK/2) [2]

Sources: [1] [2]

🌐 Web query:

psm-utils peptidoform proforma property format documentation

💡 Result:

In psm-utils, Peptidoform is essentially a thin wrapper around a ProForma v2.0 sequence parsed by pyteomics, and its properties field is the parsed sequence-wide “modifiers/properties” mapping coming from the ProForma parser. (psm-utils.readthedocs.io)

psm-utils docs (Peptidoform + properties)

API reference: psm_utils.Peptidoform / psm_utils.peptidoform.Peptidoform

Documents properties as: “Dict with sequence-wide properties” (type dict[str, Any]). (psm-utils.readthedocs.io)

Quickstart: shows ProForma 2.0 usage via Peptidoform("...") and notes parsing/support is via pyteomics.proforma. (psm-utils.readthedocs.io)

What “properties” contains (actual structure / keys)

The most concrete documentation of the shape of the “properties/modifiers” mapping is in pyteomics’ ProForma docs. The parse() example shows the returned modifiers dict containing keys like:

n_term, c_term

unlocalized_modifications, labile_modifications, fixed_modifications

intervals, isotopes, group_ids
(and depending on context, a charge_state field is also part of the ProForma representation). (pyteomics.readthedocs.io)

So in practice:

Peptidoform.parsed_sequence ~= the per-residue parsed list

Peptidoform.properties ~= the sequence-wide modifiers/properties dict from the ProForma parse (psm-utils.readthedocs.io)

ProForma “property format” (syntax in the string)

If by “property format” you mean user-supplied key/value-style annotations inside a ProForma tag, ProForma supports “info tags” like:

ELV[info:AnyString]IS

ELV[+11.9784|info:suspected frobinylation]IS (github.com)

Canonical spec reference

For the authoritative ProForma 2.0 specification and examples, use the HUPO-PSI ProForma repo / spec materials. (github.com)

Citations:

1: https://psm-utils.readthedocs.io/en/stable/api/psm_utils/?utm_source=openai

2: https://psm-utils.readthedocs.io/en/stable/api/psm_utils/?utm_source=openai

3: https://psm-utils.readthedocs.io/en/v1.0.0/quickstart/?utm_source=openai

4: https://pyteomics.readthedocs.io/en/latest/api/proforma.html?utm_source=openai

5: https://psm-utils.readthedocs.io/en/stable/api/psm_utils/?utm_source=openai

6: https://github.com/HUPO-PSI/ProForma

7: https://github.com/HUPO-PSI/ProForma?utm_source=openai

Use forward slash / instead of backslash \ to remove charge from ProForma strings.

ProForma v2.0 standard (used by psm-utils via pyteomics) appends charge with a forward slash (e.g., PEPTIDE/2), not a backslash. The current code split("\\")[0] fails to remove the charge, leaving it included in dictionary keys at lines 110 and 119. This causes different charge states of the same peptide to be treated as separate entries, fragmenting retention time aggregation and comparisons.

Change both occurrences from split("\\")[0] to split("/")[0]:

Line 110: peptide = psm.peptidoform.proforma.split("/")[0]

Line 119: peptide_rt_diff_dict[psm.peptidoform.proforma.split("/")[0]]

🤖 Prompt for AI Agents

In `@quantmsrescore/deeplc.py` at line 110, The ProForma charge-splitting uses a backslash; update both occurrences that call psm.peptidoform.proforma.split("\\")[0] to split on a forward slash instead (split("/")[0]) so peptide keys are charge-agnostic—replace the usage at the assignment to peptide (the line with peptide = psm.peptidoform.proforma.split...) and the lookup/assignment into peptide_rt_diff_dict that references psm.peptidoform.proforma.split("\\")[0].

Updates in the threads design of the tool.

f9dc75b

ypriverol requested a review from daichengxin January 30, 2026 13:22

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

daichengxin approved these changes Feb 1, 2026

View reviewed changes

ypriverol merged commit a77ab46 into main Feb 1, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates in the threads design of the tool because of random fails in HPC#60

Updates in the threads design of the tool because of random fails in HPC#60
ypriverol merged 1 commit intomainfrom
dev

ypriverol commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ypriverol commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

psm-utils docs (Peptidoform + properties)

What “properties” contains (actual structure / keys)

ProForma “property format” (syntax in the string)

Canonical spec reference

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ypriverol commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

psm-utils docs (Peptidoform + `properties`)