Conversation
📝 WalkthroughWalkthroughThis pull request adds GPU disabling support for HPC/CPU-only environments and introduces robust multiprocessing pool management with worker initialization, timeout handling, and improved logging across the quantmsrescore module. Core changes include a new Changes
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Pool as Pool Manager
participant Worker as Worker Process
participant Config as configure_worker_process()
participant Result as Result Handler
App->>Pool: Initialize pool with initializer
Pool->>Worker: Spawn worker process
Worker->>Config: Call initializer function
Config->>Config: Suppress warnings
Config->>Config: Set CUDA_VISIBLE_DEVICES=""
Config->>Config: Configure thread limits
Config-->>Worker: Initialization complete
App->>Pool: Submit async task (apply_async)
Pool->>Worker: Execute task
Worker-->>Result: Return result (with timeout)
Result->>Result: Check timeout (3600s)
alt Timeout Error
Result->>Pool: Terminate pool
Result-->>App: Re-raise TimeoutError
else Success
Result-->>App: Return result
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@quantmsrescore/deeplc.py`:
- Line 110: The ProForma charge-splitting uses a backslash; update both
occurrences that call psm.peptidoform.proforma.split("\\")[0] to split on a
forward slash instead (split("/")[0]) so peptide keys are
charge-agnostic—replace the usage at the assignment to peptide (the line with
peptide = psm.peptidoform.proforma.split...) and the lookup/assignment into
peptide_rt_diff_dict that references psm.peptidoform.proforma.split("\\")[0].
🧹 Nitpick comments (6)
.gitignore (1)
167-170: LGTM!The addition of the Cursor AI rules directory to .gitignore is appropriate for excluding tooling artifacts from version control.
Optional: Consider adding a space after
#in the comment for consistency with Python comment conventions.✨ Optional formatting improvement
-#Ignore cursor AI rules +# Ignore cursor AI rules .cursor/rules/codacy.mdcquantmsrescore/logging_config.py (1)
200-200: Consider movingimport osto module level.The
osmodule is already imported at module level inquantmsrescore/__init__.pyand other files. Moving this import to the top of the file would be more consistent with the codebase style and slightly more efficient for repeated calls.Suggested change
Add at the top of the file with other imports:
import osThen remove line 200.
quantmsrescore/annotator.py (1)
408-409: Use f-string conversion flag instead of explicitstr()call.Per static analysis (Ruff RUF010), prefer the
!sconversion flag for cleaner syntax.Suggested fix
logger.info( - f"Successfully applied AlphaPeptDeep annotation using model: {str(alphapeptdeep_generator._peptdeep_model)}") + f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model!s}")quantmsrescore/ms2pip.py (1)
183-191: Consider usinglogger.exceptionfor richer error context.The static analysis tool suggests using
logging.exceptioninstead oflogging.error(Ruff TRY400). While forTimeoutErrorthe traceback isn't particularly informative, usinglogger.exceptionis idiomatic in except blocks and provides consistent logging behavior.Note: The
pool.close()andpool.join()calls at lines 190-191 won't execute after the re-raise, but this is acceptable sincepool.terminate()already handles worker cleanup.♻️ Optional: Use logger.exception for consistency
try: results = [r.get(timeout=_POOL_GET_TIMEOUT) for r in mp_results] except multiprocessing.TimeoutError: - logger.error(f"Pool operation timed out after {_POOL_GET_TIMEOUT} seconds") + logger.exception(f"Pool operation timed out after {_POOL_GET_TIMEOUT} seconds") pool.terminate() raisequantmsrescore/alphapeptdeep.py (2)
218-245: Code duplication withms2pip.py.This
_get_poolmethod is nearly identical toPatchParallelized._get_pool()inquantmsrescore/ms2pip.py(lines 200-224). Consider extracting this to a shared utility function (e.g., inquantmsrescore/logging_config.pyalongsideconfigure_worker_process, or a newmultiprocessing_utils.pymodule) to reduce maintenance burden and ensure consistent behavior.♻️ Suggested shared utility approach
Create a shared utility function that both classes can use:
# In quantmsrescore/multiprocessing_utils.py or logging_config.py def get_multiprocessing_pool(processes: int, initializer=None, logger=None): """Get multiprocessing pool with recursion/daemon protection.""" if logger: logger.debug(f"Starting workers (processes={processes})...") if multiprocessing.current_process().daemon: if logger: logger.warning( "Running in a daemon process. Disabling multiprocessing as daemonic " "processes cannot have children." ) return multiprocessing.dummy.Pool(1) if processes == 1: if logger: logger.debug("Using dummy multiprocessing pool.") return multiprocessing.dummy.Pool(1) if multiprocessing.parent_process() is not None: if logger: logger.warning( "Attempting to create a pool inside a worker process! " "Returning a dummy pool instead." ) return multiprocessing.dummy.Pool(1) return multiprocessing.get_context("spawn").Pool( processes, initializer=initializer or configure_worker_process )
990-1049: Code duplication:read_spectrum_filefunction.This function is identical to
read_spectrum_fileinquantmsrescore/ms2pip.py(lines 560-619). Consider extracting it to a shared module (e.g.,quantmsrescore/spectrum_utils.pyor withinquantmsrescore/openms.py) to eliminate duplication and ensure consistent behavior across both feature generators.
| "rt_diff": rt_diffs_run[i], | ||
| } | ||
| ) | ||
| peptide = psm.peptidoform.proforma.split("\\")[0] # remove charge |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for peptidoform and proforma parsing patterns
rg -n "peptidoform.*split" --type py
rg -n "proforma.*split" --type pyRepository: bigbio/quantms-rescoring
Length of output: 561
🏁 Script executed:
# Check the broader context around line 110 in deeplc.py
fd "deeplc.py" -x cat -n {}Repository: bigbio/quantms-rescoring
Length of output: 6653
🏁 Script executed:
# Search for any peptidoform or proforma usage patterns
rg -n "peptidoform" --type py -A 2 -B 2Repository: bigbio/quantms-rescoring
Length of output: 10247
🏁 Script executed:
# Search for test files and example data
fd -e py -path "*/test*" | head -20
rg -n "proforma" --type py -B 3 -A 3Repository: bigbio/quantms-rescoring
Length of output: 1536
🏁 Script executed:
# Search for any example PSM data or format specifications
fd -e json -o -e csv -o -e txt | xargs grep -l "peptidoform\|proforma" 2>/dev/null | head -10Repository: bigbio/quantms-rescoring
Length of output: 184
🏁 Script executed:
# Look for PSM object definition or peptidoform usage documentation
rg -n "class.*PSM\|class.*Peptidoform\|class.*peptidoform" --type pyRepository: bigbio/quantms-rescoring
Length of output: 50
🏁 Script executed:
# Search for any evidence of the actual proforma string format
rg -n "\.proforma\s*=" --type py -A 1 -B 1Repository: bigbio/quantms-rescoring
Length of output: 50
🏁 Script executed:
# Find test files and fixtures
find . -type f -name "*test*.py" | head -20Repository: bigbio/quantms-rescoring
Length of output: 92
🏁 Script executed:
# Check dependencies
cat requirements.txt setup.py pyproject.toml 2>/dev/null | grep -i psmRepository: bigbio/quantms-rescoring
Length of output: 93
🏁 Script executed:
# Look for any pytest fixtures or test data showing proforma format
rg -n "proforma.*=" --type py -A 2Repository: bigbio/quantms-rescoring
Length of output: 50
🏁 Script executed:
# Search for logging or print statements that might show proforma values
rg -n "logger.*proforma\|print.*proforma\|f.*proforma" --type pyRepository: bigbio/quantms-rescoring
Length of output: 50
🏁 Script executed:
# Check if there's any documentation or README mentioning format
rg -n "proforma\|peptidoform" README.md CONTRIBUTING.md docs/ 2>/dev/null -A 2Repository: bigbio/quantms-rescoring
Length of output: 50
🏁 Script executed:
# Check test file for PSM examples
cat -n ./tests/test_commands.py | head -100Repository: bigbio/quantms-rescoring
Length of output: 3637
🏁 Script executed:
# Search for any example input files or data
find . -type f \( -name "*.mzml" -o -name "*.mgf" -o -name "*.idxml" -o -name "*.pepxml" \) 2>/dev/null | head -5Repository: bigbio/quantms-rescoring
Length of output: 50
🏁 Script executed:
# Look for any JSON or TSV test data
find ./tests -type f \( -name "*.json" -o -name "*.tsv" -o -name "*.csv" \) 2>/dev/nullRepository: bigbio/quantms-rescoring
Length of output: 179
🌐 Web query:
ProForma notation standard charge separation format
💡 Result:
In HUPO‑PSI ProForma (v2.x), the precursor charge state is written by appending a slash and an integer at the very end of the ProForma string:
- Format:
.../zwherezis the charge (e.g.,2for 2+) - Example:
VAEINPSNGGTT/2(sequence with charge 2+) [1] - This is the common “charge separation” convention used by ProForma parsers/tools (e.g.,
AC[UNIMOD:4]DEK/2) [2]
Sources: [1] [2]
🌐 Web query:
psm-utils peptidoform proforma property format documentation
💡 Result:
In psm-utils, Peptidoform is essentially a thin wrapper around a ProForma v2.0 sequence parsed by pyteomics, and its properties field is the parsed sequence-wide “modifiers/properties” mapping coming from the ProForma parser. (psm-utils.readthedocs.io)
psm-utils docs (Peptidoform + properties)
- API reference:
psm_utils.Peptidoform/psm_utils.peptidoform.Peptidoform- Documents
propertiesas: “Dict with sequence-wide properties” (typedict[str, Any]). (psm-utils.readthedocs.io)
- Documents
- Quickstart: shows ProForma 2.0 usage via
Peptidoform("...")and notes parsing/support is viapyteomics.proforma. (psm-utils.readthedocs.io)
What “properties” contains (actual structure / keys)
The most concrete documentation of the shape of the “properties/modifiers” mapping is in pyteomics’ ProForma docs. The parse() example shows the returned modifiers dict containing keys like:
n_term,c_termunlocalized_modifications,labile_modifications,fixed_modificationsintervals,isotopes,group_ids
(and depending on context, acharge_statefield is also part of the ProForma representation). (pyteomics.readthedocs.io)
So in practice:
Peptidoform.parsed_sequence~= the per-residue parsed listPeptidoform.properties~= the sequence-wide modifiers/properties dict from the ProForma parse (psm-utils.readthedocs.io)
ProForma “property format” (syntax in the string)
If by “property format” you mean user-supplied key/value-style annotations inside a ProForma tag, ProForma supports “info tags” like:
ELV[info:AnyString]ISELV[+11.9784|info:suspected frobinylation]IS(github.com)
Canonical spec reference
For the authoritative ProForma 2.0 specification and examples, use the HUPO-PSI ProForma repo / spec materials. (github.com)
Citations:
- 1: https://psm-utils.readthedocs.io/en/stable/api/psm_utils/?utm_source=openai
- 2: https://psm-utils.readthedocs.io/en/stable/api/psm_utils/?utm_source=openai
- 3: https://psm-utils.readthedocs.io/en/v1.0.0/quickstart/?utm_source=openai
- 4: https://pyteomics.readthedocs.io/en/latest/api/proforma.html?utm_source=openai
- 5: https://psm-utils.readthedocs.io/en/stable/api/psm_utils/?utm_source=openai
- 6: https://github.com/HUPO-PSI/ProForma
- 7: https://github.com/HUPO-PSI/ProForma?utm_source=openai
Use forward slash / instead of backslash \ to remove charge from ProForma strings.
ProForma v2.0 standard (used by psm-utils via pyteomics) appends charge with a forward slash (e.g., PEPTIDE/2), not a backslash. The current code split("\\")[0] fails to remove the charge, leaving it included in dictionary keys at lines 110 and 119. This causes different charge states of the same peptide to be treated as separate entries, fragmenting retention time aggregation and comparisons.
Change both occurrences from split("\\")[0] to split("/")[0]:
- Line 110:
peptide = psm.peptidoform.proforma.split("/")[0] - Line 119:
peptide_rt_diff_dict[psm.peptidoform.proforma.split("/")[0]]
🤖 Prompt for AI Agents
In `@quantmsrescore/deeplc.py` at line 110, The ProForma charge-splitting uses a
backslash; update both occurrences that call
psm.peptidoform.proforma.split("\\")[0] to split on a forward slash instead
(split("/")[0]) so peptide keys are charge-agnostic—replace the usage at the
assignment to peptide (the line with peptide =
psm.peptidoform.proforma.split...) and the lookup/assignment into
peptide_rt_diff_dict that references psm.peptidoform.proforma.split("\\")[0].
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.