Skip to content

Enabling transfer learning in AlphaPeptDeep#40

Merged
ypriverol merged 32 commits intomainfrom
dev
Dec 4, 2025
Merged

Enabling transfer learning in AlphaPeptDeep#40
ypriverol merged 32 commits intomainfrom
dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Dec 1, 2025

PR Type

Enhancement


Description

  • Add MS2 transfer learning support for AlphaPeptDeep model

  • Implement PPM tolerance unit support alongside Da units

  • Create MS2ModelManager class for fine-tuning capabilities

  • Add transfer learning configuration parameters to CLI and annotator

  • Refactor custom_correlate function to support model manager objects


Diagram Walkthrough

flowchart LR
  A["User Input<br/>transfer_learning<br/>ms2_tolerance_unit"] --> B["FeatureAnnotator<br/>Configuration"]
  B --> C["AlphaPeptDeepAnnotator"]
  C --> D["MS2ModelManager"]
  D --> E["Fine-tune MS2<br/>Model"]
  E --> F["Predictions with<br/>Retrained Model"]
Loading

File Walkthrough

Relevant files
Configuration changes
__init__.py
Version bump to 0.0.13                                                                     

quantmsrescore/init.py

  • Bump version from 0.0.12 to 0.0.13
+1/-1     
pyproject.toml
Version bump to 0.0.13                                                                     

pyproject.toml

  • Update version from 0.0.12 to 0.0.13
+1/-1     
Enhancement
alphapeptdeep.py
Add transfer learning and PPM tolerance support                   

quantmsrescore/alphapeptdeep.py

  • Add ms2_tolerance_unit parameter to support PPM and Da units
  • Import new MS2ModelManager class for transfer learning
  • Refactor custom_correlate function to accept ModelManager objects
    instead of model strings
  • Implement transfer learning logic in validate_features and
    add_features methods
  • Add transfer learning parameters: transfer_learning,
    transfer_learning_test_ratio, epoch_to_train_ms2
  • Create ms2_fine_tune function replacing make_prediction with
    fine-tuning capabilities
  • Add _get_targets_df_for_psm function for DataFrame-based target
    extraction
  • Update tolerance handling to support both PPM and Da units in
    _get_targets_for_psm
+260/-57
annotator.py
Add tolerance unit and transfer learning parameters           

quantmsrescore/annotator.py

  • Add ms2_tolerance_unit parameter to FeatureAnnotator initialization
  • Implement logic to enable AlphaPeptDeep when PPM tolerance is selected
  • Add transfer learning configuration parameters to annotator
  • Update _create_alphapeptdeep_annotator to pass tolerance unit and
    transfer learning parameters
  • Modify _find_and_apply_ms2_model to handle PPM tolerance by skipping
    MS2PIP
+68/-27 
ms2_model_manager.py
New MS2ModelManager for transfer learning                               

quantmsrescore/ms2_model_manager.py

  • Create new MS2ModelManager class extending ModelManager
  • Implement ms2_fine_tuning method for transfer learning configuration
  • Implement train_ms2_model method for model fine-tuning with PSM
    sampling
  • Add save_ms2_model method to persist retrained model weights
  • Support grid NCE search and important PTM sampling during training
+174/-0 
ms2rescore.py
Add CLI options for transfer learning and tolerance units

quantmsrescore/ms2rescore.py

  • Add --ms2_tolerance_unit CLI option supporting Da and ppm choices
  • Add transfer learning CLI options: --transfer_learning,
    --transfer_learning_test_ratio, --save_retrain_model,
    --epoch_to_train_ms2
  • Update msrescore2feature function signature to accept new parameters
  • Pass all new parameters to FeatureAnnotator initialization
+62/-22 

Summary by CodeRabbit

  • New Features

    • Unit-aware MS² tolerance (Da/ppm) that affects model selection and enables MS² fine‑tuning with transfer‑learning, configurable test split, epochs, weight reuse and save options.
    • New MS² model manager to coordinate MS² models, support fine‑tuning, evaluation and saving of retrained models.
    • New CLI/annotator options to control tolerance unit, transfer‑learning, test ratio, save‑retrain, and training epochs.
  • Chores

    • Package version bumped to 0.0.13.
    • Pinned peptdeep dependency to 1.4.0.
  • CI

    • Added workflow step to free disk space during CI.

✏️ Tip: You can customize this high-level summary in your review settings.

@ypriverol ypriverol requested a review from Copilot December 1, 2025 06:55
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds unit-aware MS2 tolerance (Da/ppm), a new MS2ModelManager for MS² fine‑tuning and persistence, threads transfer‑learning/save/epoch controls through AlphaPeptDeep/annotator flows, extends the CLI with related flags, pins peptdeep to 1.4.0, adds small CI cleanup steps, and bumps version 0.0.12 → 0.0.13.

Changes

Cohort / File(s) Summary
Version bump
pyproject.toml, quantmsrescore/__init__.py
Package version updated from 0.0.12 to 0.0.13.
MS2 model manager
quantmsrescore/ms2_model_manager.py
New MS2ModelManager class: manages pDeep/RT/CCS/charge models, supports loading/installing, ms2_fine_tuning, train_ms2_model, and save_ms2_model; includes sampling/splitting, optional NCE/instrument grid search, and verbose training/evaluation.
AlphaPeptDeep integration
quantmsrescore/alphapeptdeep.py
Adds ms2_tolerance_unit, new ms2_fine_tune() and _get_targets_df_for_psm(), routes MS2 flows through MS2ModelManager instances, returns model weights, and wires transfer‑learning, epoch and save controls; updates multiple class/function signatures.
Annotator configuration
quantmsrescore/annotator.py
Constructor extended with ms2_tolerance_unit, transfer_learning, transfer_learning_test_ratio, save_retrain_model, epoch_to_train_ms2; unit-aware model selection between MS2PIP/AlphaPeptDeep, updated _create_alphapeptdeep_annotator() / _find_and_apply_ms2_model() and new _apply_alphapeptdeep_model.
CLI interface
quantmsrescore/ms2rescore.py
msrescore2feature() signature and click options extended with --ms2_tolerance_unit, --transfer_learning, --transfer_learning_test_ratio, --save_retrain_model, --epoch_to_train_ms2; new args forwarded to the annotator and documented.
Dependency pins
environment.yml, requirements.txt
peptdeep dependency pinned to peptdeep==1.4.0.
CI workflows
.github/workflows/...python-*.yml
Added a post-checkout step to free runner disk space by removing /opt/hostedtoolcache and printing disk usage before/after.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as CLI (msrescore2feature)
    participant Annot as Annotator
    participant AP as AlphaPeptDeep
    participant MM as MS2ModelManager
    participant SM as SpectrumMatcher

    CLI->>Annot: init(..., ms2_tolerance_unit, transfer flags, epoch_to_train_ms2)
    Annot->>AP: create/annotator(ms2_tolerance_unit, model_mgr?)
    AP->>AP: custom_correlate(..., model_mgr?, ms2_tolerance_unit)
    alt transfer_learning enabled
        AP->>MM: ms2_fine_tuning(psms_df, match_intensity_df, ...)
        MM->>SM: build matched intensity frames (respect tolerance unit)
        MM->>MM: train_ms2_model(..., grid-search?, epochs)
        MM-->>AP: return trained model / model_weights
        AP->>SM: predict using trained model_weights
    else transfer_learning disabled
        AP->>SM: predict using selected existing model (MS2PIP or AlphaPeptDeep)
    end
    AP-->>Annot: results + model_weights
    alt save_retrain_model
        Annot->>MM: save_ms2_model()
    end
    Annot-->>CLI: annotated features returned
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus review on:
    • quantmsrescore/ms2_model_manager.py — sampling/splitting logic, grid-search behavior, device handling, save/load semantics, training correctness.
    • quantmsrescore/alphapeptdeep.py — ms2_fine_tune flow, model_mgr integration, target-matching respecting Da/ppm, and model_weights plumbing.
    • quantmsrescore/annotator.py & quantmsrescore/ms2rescore.py — public/CLI signature changes, correct flag forwarding, unit-aware model selection and fallback logic.
    • CI workflow steps — verify runner safety for removing /opt/hostedtoolcache.

Possibly related PRs

Poem

🐰 I hopped through code with tiny paws,
I learned of Da and ppm laws.
Models tuned and weights held tight,
Pip pinned firm — the tests take flight.
Version bumped — a rabbit's delight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Enabling transfer learning in AlphaPeptDeep' accurately and concisely describes the primary change of this pull request.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Dec 1, 2025

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Unhandled Edge Cases: The ms2_fine_tune function assumes non-empty spectra and calibration sets, performing
operations like pd.concat on possibly empty lists and accessing DataFrame rows without
guarding against missing matches or empty results.

Referred Code
              ms2_tolerance, ms2_tolerance_unit, processes, consider_modloss, higher_score_better,
              calibration_set_size, transfer_learning, transfer_learning_test_ratio, epoch_to_train_ms2):
if consider_modloss:
    frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                  'b_modloss_z1', 'b_modloss_z2',
                  'y_modloss_z1', 'y_modloss_z2']
else:
    frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2']

if transfer_learning:
    enumerated_psm_list_copy = enumerated_psm_list.copy()
    # Select only PSMs that are target and not decoys
    enumerated_psm_list_copy = [
        psm
        for psm in enumerated_psm_list_copy
        if not psm.is_decoy and psm.rank == 1
    ]
    # Sort alphapeptdeep results by PSM score and lower score is better
    enumerated_psm_list_copy.sort(key=lambda x: x.score, reverse=higher_score_better)

    # Get a calibration set, the % of psms to be used for calibrarion is defined by calibration_set_size


 ... (clipped 86 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing Audit Logs: The new transfer learning and model selection flows execute critical actions (model
training, selection, saving) without emitting structured audit logs identifying user,
action, and outcome.

Referred Code
            if not alphapeptdeep_generator.validate_features(psm_list, psms_df):
                logger.error("Original model validation failed. No features added.")
                return  # Exit early since validation failed
            logger.info(f"Using original model: {model_to_use}")

        # Apply the selected model
        alphapeptdeep_generator.model = model_to_use
        alphapeptdeep_generator.add_features(psm_list, psms_df)
        logger.info(
            f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model}")

    except Exception as e:
        logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
        return  # Indicate failure through early return

    return  # Successful completion

def _create_alphapeptdeep_annotator(self, model: Optional[str] = None, tolerance: Optional[float] = None,
                                    tolerance_unit: Optional[str] = None):
    """
    Create an AlphaPeptDeep annotator with the specified or default model.


 ... (clipped 31 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status:
Ambiguous Names: Variables like model_mgr/model_weights and results naming in fine-tuning paths are
somewhat ambiguous regarding their types and roles, which may hinder readability.

Referred Code
def custom_correlate(
        psms: Union[PSMList, str, Path],
        psms_df: pd.DataFrame,
        spectrum_file: Union[str, Path],
        psm_filetype: Optional[str] = None,
        spectrum_id_pattern: Optional[str] = None,
        compute_correlations: bool = False,
        model: ModelManager = None,
        ms2_tolerance: float = 0.02,
        ms2_tolerance_unit: str = "Da",
        consider_modloss: bool = False,
        processes: Optional[int] = None,
        higher_score_better: bool = True,
        calibration_set_size: Optional[float] = 0.20,
        transfer_learning: bool = True,
        transfer_learning_test_ratio: float = 0.3,
        epoch_to_train_ms2: int = 20
) -> tuple[Union[list[ProcessingResult], Any], Any]:
    """
    Custom implementation of correlate that uses our custom spectrum reader.
    """


 ... (clipped 17 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Verbose Errors: Logging messages echo exceptions and internal model states (e.g., selected model object
and weights) that may expose internal implementation details to user-facing logs depending
on deployment configuration.

Referred Code
        if not alphapeptdeep_generator.validate_features(psm_list, psms_df):
            logger.error("Original model validation failed. No features added.")
            return  # Exit early since validation failed
        logger.info(f"Using original model: {model_to_use}")

    # Apply the selected model
    alphapeptdeep_generator.model = model_to_use
    alphapeptdeep_generator.add_features(psm_list, psms_df)
    logger.info(
        f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model}")

except Exception as e:
    logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
    return  # Indicate failure through early return

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured Logs: Newly added logs are free-form strings without structured fields and may include model
paths or identifiers; consider structured logging to avoid accidental sensitive data
leakage.

Referred Code
            if not alphapeptdeep_generator.validate_features(psm_list, psms_df):
                logger.error("Original model validation failed. No features added.")
                return  # Exit early since validation failed
            logger.info(f"Using original model: {model_to_use}")

        # Apply the selected model
        alphapeptdeep_generator.model = model_to_use
        alphapeptdeep_generator.add_features(psm_list, psms_df)
        logger.info(
            f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model}")

    except Exception as e:
        logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
        return  # Indicate failure through early return

    return  # Successful completion

def _create_alphapeptdeep_annotator(self, model: Optional[str] = None, tolerance: Optional[float] = None,
                                    tolerance_unit: Optional[str] = None):
    """
    Create an AlphaPeptDeep annotator with the specified or default model.


 ... (clipped 31 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
CLI Validation Gaps: New CLI options (ms2_tolerance_unit, transfer learning flags and ratios, epochs) lack
explicit validation of ranges and combinations, which could lead to misuse or resource
exhaustion.

Referred Code
    "--ms2_tolerance_unit",
    help="Fragment mass tolerance unit (default: Da)",
    type=click.Choice(['Da', 'ppm'], case_sensitive=False),
    default="Da",
)
@click.option(
    "--calibration_set_size",
    help="Percentage of number of psms to use for calibration and retraining (default: `0.20)",
    default=0.20,
)
@click.option(
    "--valid_correlations_size",
    help="Percentage of number of psms with correlation above the threshold (default: `0.70)",
    default=0.70,
)
@click.option(
    "--skip_deeplc_retrain",
    help="Skip retraining of DeepLC model (default: `False`)",
    is_flag=True,
)
@click.option(


 ... (clipped 56 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Dec 1, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
High-level
Refactor fine-tuning logic for simplicity

The transfer learning implementation in alphapeptdeep.py should be refactored to
use the peptdeep library's native data preparation and fine-tuning workflows.
This will simplify the code, improve maintainability, and reduce custom logic
for training data creation.

Examples:

quantmsrescore/alphapeptdeep.py [773-873]
def ms2_fine_tune(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model_mgr,
                  ms2_tolerance, ms2_tolerance_unit, processes, consider_modloss, higher_score_better,
                  calibration_set_size, transfer_learning, transfer_learning_test_ratio, epoch_to_train_ms2):
    if consider_modloss:
        frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2',
                      'b_modloss_z1', 'b_modloss_z2',
                      'y_modloss_z1', 'y_modloss_z2']
    else:
        frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2']


 ... (clipped 91 lines)

Solution Walkthrough:

Before:

# in quantmsrescore/alphapeptdeep.py
def ms2_fine_tune(...):
    # ... select calibration_set from PSMs ...
    precursor_df = ... # from calibration_set
    
    # Manually iterate through spectra to build training data
    match_intensity_df = []
    for spectrum in read_spectrum_file(spec_file):
        # ... find matching PSMs for spectrum ...
        for psm in psms_for_spectrum:
            # Match theoretical fragments to observed spectrum peaks
            match_intensity = _get_targets_df_for_psm(...) 
            match_intensity_df.append(match_intensity)
    
    match_intensity_df = pd.concat(match_intensity_df)

    # Call fine-tuning with manually prepared data
    model_mgr.ms2_fine_tuning(precursor_df, match_intensity_df, ...)

After:

# in quantmsrescore/alphapeptdeep.py
def ms2_fine_tune(...):
    # ... select calibration_set from PSMs ...
    precursor_df_for_tuning = ... # from calibration_set

    # Let peptdeep handle data preparation and fine-tuning
    # The model manager would internally handle spectrum reading,
    # matching, and target intensity generation.
    model_mgr.fine_tune_ms2_model(
        psm_df=precursor_df_for_tuning,
        spectrum_files=[spec_file],
        ... # other tuning parameters
    )
    
    # ... rest of the prediction logic ...
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a major design issue where the PR manually re-implements complex data preparation logic in ms2_fine_tune, instead of using the peptdeep library's native capabilities, significantly impacting maintainability and code simplicity.

High
Possible issue
Fix incorrect DataFrame indexing

Fix an indexing error by filtering matched_intensity_df using the indices of the
sampled tr_df to create tr_inten_df, ensuring data alignment for model training.

quantmsrescore/ms2_model_manager.py [91-97]

 if len(tr_df) > 0:
-    tr_inten_df = pd.DataFrame()
-    for frag_type in self.ms2_model.charged_frag_types:
-        if frag_type in matched_intensity_df.columns:
-            tr_inten_df[frag_type] = matched_intensity_df[frag_type]
-        else:
-            tr_inten_df[frag_type] = 0.0
+    tr_inten_df = matched_intensity_df.loc[tr_df.index].copy()
+    missing_cols = set(self.ms2_model.charged_frag_types) - set(tr_inten_df.columns)
+    for col in missing_cols:
+        tr_inten_df[col] = 0.0
+    tr_inten_df = tr_inten_df[self.ms2_model.charged_frag_types]
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a critical bug where tr_inten_df is not correctly filtered to match the sampled tr_df, which would lead to a KeyError or data mismatch during model training.

Medium
Conditionally save the retrained model
Suggestion Impact:The commit wraps the model_weights.save_ms2_model() call in a conditional check using self._save_retrain_model, preventing unconditional saving of the retrained model.

code diff:

-                    model_weights.save_ms2_model()
+                    if self._save_retrain_model:
+                        model_weights.save_ms2_model()

Conditionally save the retrained model by wrapping the
model_weights.save_ms2_model() call in an if self._save_retrain_model: block.

quantmsrescore/alphapeptdeep.py [543-559]

 alphapeptdeep_results, model_weights = custom_correlate(
     psms=psm_list_run,
     spectrum_file=str(spectrum_filename),
     psms_df=psms_df,
     spectrum_id_pattern=self.spectrum_id_pattern,
     model=model_mgr,
     ms2_tolerance=self.ms2_tolerance,
     ms2_tolerance_unit=self.ms2_tolerance_unit,
     compute_correlations=True,
     processes=self.processes,
     higher_score_better=self._higher_score_better,
     calibration_set_size=self._calibration_set_size,
     transfer_learning=transfer_learning,
     transfer_learning_test_ratio=self._transfer_learning_test_ratio,
     epoch_to_train_ms2=self._epoch_to_train_ms2
 )
-model_weights.save_ms2_model()
+if self._save_retrain_model:
+    model_weights.save_ms2_model()

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that the newly added _save_retrain_model flag is not used, causing the retrained model to be saved unconditionally, which contradicts the feature's intent.

Low
General
Clarify feature generator selection logic
Suggestion Impact:The commit modified the selection logic to raise a ValueError if 'ms2pip' is used with 'ppm' tolerance and removed the previous silent fallback to AlphaPeptDeep with a warning.

code diff:

@@ -103,13 +106,13 @@
         self._deepLC = "deeplc" in feature_annotators
         if "ms2pip" in feature_annotators and ms2_tolerance_unit == "Da":
             self._ms2pip = True
+        elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm":
+            raise ValueError(
+                "MS2PIP only supports Da units. Please remove 'ms2pip' from feature_generators or set ms2_tolerance_unit to 'Da'.")
         else:
             self._ms2pip = False
         if "alphapeptdeep" in feature_annotators:
             self._alphapeptdeep = True
-        elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm":
-            self._alphapeptdeep = True
-            logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!")
         else:
             self._alphapeptdeep = False

Refactor the feature generator selection logic to raise a ValueError if ms2pip
is used with ppm tolerance, instead of silently switching to alphapeptdeep.

quantmsrescore/annotator.py [104-114]

-if "ms2pip" in feature_annotators and ms2_tolerance_unit == "Da":
-    self._ms2pip = True
-else:
-    self._ms2pip = False
-if "alphapeptdeep" in feature_annotators:
+self._ms2pip = "ms2pip" in feature_annotators
+self._alphapeptdeep = "alphapeptdeep" in feature_annotators
+
+if self._ms2pip and ms2_tolerance_unit == "ppm":
+    raise ValueError("MS2PIP only supports 'Da' tolerance unit. Use 'alphapeptdeep' for 'ppm' tolerance.")
+
+if not self._alphapeptdeep and not self._ms2pip and ms2_tolerance_unit == "ppm":
     self._alphapeptdeep = True
-elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm":
-    self._alphapeptdeep = True
-    logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!")
-else:
-    self._alphapeptdeep = False
+    logger.warning("Tolerance unit is 'ppm', enabling AlphaPeptDeep as the MS2 feature generator.")

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion improves the logic for selecting feature generators by making it more explicit and robust, preventing silent, unexpected behavior and instead raising an error for invalid configurations.

Medium
  • Update

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
quantmsrescore/annotator.py (1)

506-522: Potential UnboundLocalError: ms2pip_generator may be undefined when tolerance unit is not "Da".

When ms2_tolerance_unit != "Da", ms2pip_generator is never initialized (lines 436-443), but it's referenced in this else branch (lines 507, 512, 519, 520). If alphapeptdeep_best_corr <= ms2pip_best_corr (e.g., both are -1 in an edge case), this code path will fail.

Consider guarding this branch or ensuring AlphaPeptDeep path is always taken when unit is ppm:

             else:
+                if self._ms2_tolerance_unit != "Da":
+                    logger.error("MS2PIP is not available with ppm tolerance unit")
+                    return
                 if ms2pip_best_model and ms2pip_generator.validate_features(psm_list=psm_list, model=ms2pip_best_model):

Or refactor the comparison to ensure the ppm case always selects AlphaPeptDeep.

quantmsrescore/alphapeptdeep.py (2)

1053-1082: Incorrect return type annotation and outdated docstring.

The return type says Tuple[Optional[np.ndarray], Dict[str, np.ndarray]] but the function returns Tuple[np.ndarray, np.ndarray] (two arrays for b and y intensities). The docstring also references non-existent parameters:

 def _get_targets_for_psm(
         b_frag_mzs: np.array,
         y_frag_mzs: np.array,
         spectrum: ObservedSpectrum,
         ms2_tolerance: float,
         ms2_tolerance_unit: str
-) -> Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]:
+) -> Tuple[np.ndarray, np.ndarray]:
     """
     Get targets for a PSM from a spectrum.

     Parameters
     ----------
-    psm
-        The PSM to get targets for.
+    b_frag_mzs
+        Array of b-ion fragment m/z values.
+    y_frag_mzs
+        Array of y-ion fragment m/z values.
     spectrum
         The spectrum to get targets from.
-    encoder
-        The encoder to use for peptide and peptidoform encoding.
     ms2_tolerance
         The MS2 tolerance to use.
-    model
-        The model name.
-    ion_types
-        The ion types to use.
+    ms2_tolerance_unit
+        Unit for tolerance ("Da" or "ppm").

     Returns
     -------
-    Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]
-        A tuple containing the encoded peptidoform and the targets.
+    Tuple[np.ndarray, np.ndarray]
+        Tuple of (b_ion_intensities, y_ion_intensities), normalized by max intensity.
     """

559-568: save_ms2_model() call is in wrong scope for exception handling.

If save_ms2_model() fails, it will be caught by the NoMatchingSpectraFound handler, which would incorrectly raise a FeatureGeneratorException about spectrum matching. Move the save call outside the try block:

                     alphapeptdeep_results, model_weights = custom_correlate(
                         ...
                     )
-                    model_weights.save_ms2_model()
-
                 except NoMatchingSpectraFound as e:
                     raise FeatureGeneratorException(
                         ...
                     ) from e
+                if self._save_retrain_model:
+                    model_weights.save_ms2_model()
                 self._calculate_features(psm_list_run, alphapeptdeep_results)
🧹 Nitpick comments (14)
quantmsrescore/ms2_model_manager.py (2)

13-17: Use Optional[str] for the model_dir parameter.

Per PEP 484, parameters with None as default should use Optional[T] or T | None type hint.

+from typing import Optional
+
 class MS2ModelManager(ModelManager):
     def __init__(self,
                  mask_modloss: bool = False,
                  device: str = "gpu",
-                 model_dir: str = None,
+                 model_dir: Optional[str] = None,
                  ):

173-174: Consider making the output filename configurable.

The hardcoded filename "retrained_ms2.pth" may cause overwrites if this method is called multiple times or from different contexts.

-    def save_ms2_model(self):
-        self.ms2_model.save("retrained_ms2.pth")
+    def save_ms2_model(self, filename: str = "retrained_ms2.pth"):
+        """Save the trained MS2 model to a file."""
+        self.ms2_model.save(filename)
quantmsrescore/ms2rescore.py (2)

122-124: Add explicit type=float for transfer_learning_test_ratio.

Without an explicit type, Click may not properly validate or convert the input value.

 @click.option("--transfer_learning_test_ratio",
               help="The ratio of test data for MS2 transfer learning",
+              type=float,
               default=0.30)

215-216: Complete the docstring for epoch_to_train_ms2.

The documentation is missing the default value description.

     epoch_to_train_ms2: int, optional
         Epoch to train AlphaPeptDeep MS2 model.
+        Defaults to 20.
quantmsrescore/annotator.py (2)

352-354: Consider using logger.exception() for better error diagnostics.

logger.exception() automatically includes the stack trace, which aids debugging.

         except Exception as e:
-            logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
+            logger.exception(f"Failed to apply AlphaPeptDeep annotation: {e}")
             return  # Indicate failure through early return

441-443: Consider using logger.exception() here as well for consistency.

             except Exception as e:
-                logger.error(f"Failed to initialize MS2PIP: {e}")
+                logger.exception(f"Failed to initialize MS2PIP: {e}")
                 raise
quantmsrescore/alphapeptdeep.py (8)

43-43: Documentation missing for new ms2_tolerance_unit parameter.

The docstring describes ms2_tolerance but doesn't document the new ms2_tolerance_unit parameter. Consider adding documentation for consistency.

         ms2_tolerance
-            MS2 mass tolerance in Da. Defaults to :py:const:`0.02`.
+            MS2 mass tolerance value. Defaults to :py:const:`0.02`.
+        ms2_tolerance_unit
+            Unit for MS2 mass tolerance, either "Da" or "ppm". Defaults to :py:const:`Da`.

Also applies to: 58-59


461-470: Duplicated model manager initialization logic.

This pattern for initializing model_mgr and determining transfer_learning is repeated identically in validate_features, add_features, and _find_best_ms2_model. Consider extracting to a helper method:

def _get_model_manager(self) -> Tuple[MS2ModelManager, bool]:
    """Get or create model manager and determine transfer learning flag."""
    if self._peptdeep_model:
        return self._peptdeep_model, False
    model_mgr = MS2ModelManager(
        mask_modloss=not self._consider_modloss,
        device="cpu",
        model_dir=self.model_dir
    )
    return model_mgr, self._transfer_learning

798-800: Remove commented-out code.

These commented lines appear to be leftover debugging/development code. Remove them to improve readability.

-        # random.shuffle(calibration_set)
-        # train_set = calibration_set[: len(calibration_set) // 2]
-        # valid_set = calibration_set[len(calibration_set)]

828-832: Use raise ... from for proper exception chaining.

Per static analysis, exceptions raised within except clauses should use raise ... from to preserve the exception chain:

             except (TypeError, IndexError):
-                raise exceptions.TitlePatternError(
+                raise exceptions.TitlePatternError(
                     f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to "
                     f"spectrum ID `{spectrum.identifier}`. "
                     " Are you sure that the regex contains a capturing group?"
-                )
+                ) from None

842-842: Unused loop variable psm_idx.

The variable psm_idx is not used within the loop body. Use underscore prefix to indicate intentionally unused:

-            for psm_idx, psm in psms_by_specid[spectrum_id]:
+            for _psm_idx, psm in psms_by_specid[spectrum_id]:

1140-1142: Remove commented-out debug code.

-    # print(spectrum.mz)
-    # print(ms2_tolerance)  # 0.05
-    # spec_mz_tols = np.full_like(spectrum.mz, ms2_tolerance)

1088-1090: Remove commented-out debug code.

-    # print(spectrum.mz)
-    # print(ms2_tolerance)  # 0.05
-    # spec_mz_tols = np.full_like(spectrum.mz, ms2_tolerance)

741-741: Consider updating type hint to MS2ModelManager.

The model parameter is typed as ModelManager but the function actually expects and uses MS2ModelManager (which calls ms2_fine_tuning method). Consider updating the type annotation for better type safety:

-        model: ModelManager = None,
+        model: MS2ModelManager = None,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6ec605 and 1bb8473.

📒 Files selected for processing (6)
  • pyproject.toml (1 hunks)
  • quantmsrescore/__init__.py (1 hunks)
  • quantmsrescore/alphapeptdeep.py (17 hunks)
  • quantmsrescore/annotator.py (10 hunks)
  • quantmsrescore/ms2_model_manager.py (1 hunks)
  • quantmsrescore/ms2rescore.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
quantmsrescore/ms2rescore.py (2)
quantmsrescore/annotator.py (3)
  • build_idxml_data (144-190)
  • annotate (192-227)
  • write_idxml_file (229-253)
quantmsrescore/openms.py (1)
  • write_idxml_file (251-274)
quantmsrescore/alphapeptdeep.py (2)
quantmsrescore/openms.py (1)
  • OpenMSHelper (41-632)
quantmsrescore/ms2_model_manager.py (3)
  • MS2ModelManager (12-174)
  • save_ms2_model (173-174)
  • ms2_fine_tuning (42-59)
🪛 Ruff (0.14.6)
quantmsrescore/ms2_model_manager.py

16-16: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

quantmsrescore/ms2rescore.py

134-134: Unused function argument: ctx

(ARG001)

quantmsrescore/annotator.py

352-352: Do not catch blind exception: Exception

(BLE001)


353-353: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


442-442: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

quantmsrescore/alphapeptdeep.py

828-832: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


828-832: Avoid specifying long messages outside the exception class

(TRY003)


842-842: Loop control variable psm_idx not used within loop body

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Agent
  • GitHub Check: build
  • GitHub Check: build (3.11)
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (10)
pyproject.toml (1)

6-6: LGTM!

Version bump to 0.0.13 aligns with the new transfer learning and MS2 tolerance unit features introduced in this PR.

quantmsrescore/__init__.py (1)

7-7: LGTM!

Version string matches the updated pyproject.toml version.

quantmsrescore/ms2_model_manager.py (1)

42-59: LGTM!

The fine-tuning entry point correctly stores hyperparameters and delegates to train_ms2_model.

quantmsrescore/ms2rescore.py (3)

81-86: LGTM!

Good use of click.Choice to validate tolerance unit input, with sensible default preserving backward compatibility.


133-158: LGTM!

Function signature correctly includes all new parameters. The unused ctx parameter is a common Click pattern when using @click.pass_context.


219-240: LGTM!

All new parameters are correctly propagated to the FeatureAnnotator constructor.

quantmsrescore/annotator.py (3)

104-114: LGTM!

Good handling of the MS2PIP limitation (Da-only) with automatic fallback to AlphaPeptDeep when ppm is specified, including an appropriate warning to inform the user.


349-350: LGTM!

Logging the actual model used from _peptdeep_model provides more accurate traceability of which model was applied.


358-392: LGTM!

Method signature correctly extended with tolerance_unit parameter, and all transfer learning parameters properly propagated to the AlphaPeptDeepAnnotator.

quantmsrescore/alphapeptdeep.py (1)

875-879: Verify prediction flow after transfer learning.

The predict_all call uses psms_df (all PSMs) for prediction after fine-tuning on the calibration subset. This appears intentional (fine-tune on best PSMs, then predict for all), but please confirm this is the expected behavior for the transfer learning workflow.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request enables transfer learning functionality in AlphaPeptDeep for the quantms-rescoring workflow. The changes introduce the ability to fine-tune MS2 prediction models using experimental data, with configurable options for training epochs, test data ratios, and model saving. Additionally, the PR adds support for ppm-based MS2 tolerance units alongside the existing Da units.

Key Changes:

  • Added transfer learning capabilities to AlphaPeptDeep MS2 prediction with new CLI options (--transfer_learning, --transfer_learning_test_ratio, --save_retrain_model, --epoch_to_train_ms2)
  • Implemented MS2 tolerance unit selection (Da or ppm) via --ms2_tolerance_unit parameter
  • Created new MS2ModelManager class to manage model training and fine-tuning operations
  • Version bumped from 0.0.12 to 0.0.13

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
quantmsrescore/ms2rescore.py Added CLI options for transfer learning parameters and MS2 tolerance unit; updated function signature to include new parameters
quantmsrescore/ms2_model_manager.py New file implementing MS2ModelManager class for model fine-tuning and training operations
quantmsrescore/annotator.py Updated FeatureAnnotator to support transfer learning and ppm tolerance; added logic to switch from MS2PIP to AlphaPeptDeep when ppm is selected
quantmsrescore/alphapeptdeep.py Modified to support ModelManager architecture, added ms2_fine_tune function for transfer learning, updated custom_correlate to return model weights
quantmsrescore/init.py Version bump to 0.0.13
pyproject.toml Version bump to 0.0.13

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 133 to 158
def msrescore2feature(
ctx,
idxml: str,
mzml,
output: str,
log_level,
processes,
feature_generators,
only_features,
ms2_model_dir,
ms2_model,
force_model,
find_best_model,
ms2_tolerance,
calibration_set_size,
valid_correlations_size,
skip_deeplc_retrain,
spectrum_id_pattern: str,
psm_id_pattern: str,
consider_modloss
ctx,
idxml: str,
mzml,
output: str,
log_level,
processes,
feature_generators,
only_features,
ms2_model_dir,
ms2_model,
ms2_tolerance,
ms2_tolerance_unit,
force_model,
find_best_model,
calibration_set_size,
valid_correlations_size,
skip_deeplc_retrain,
spectrum_id_pattern: str,
psm_id_pattern: str,
consider_modloss,
transfer_learning,
transfer_learning_test_ratio,
save_retrain_model,
epoch_to_train_ms2
):
Copy link

Copilot AI Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function parameter order does not match the Click decorator order. Click passes arguments to the function in the order they are decorated (bottom-to-top in decorators, which becomes top-to-bottom in the function). The current parameter order will cause arguments to be assigned to the wrong parameters.

The correct order based on the decorators should be:

def msrescore2feature(
    ctx,
    idxml: str,
    mzml,
    output: str,
    log_level,
    processes,
    feature_generators,
    force_model,
    find_best_model,
    only_features,
    ms2_model,
    ms2_model_dir,
    ms2_tolerance,
    ms2_tolerance_unit,
    calibration_set_size,
    valid_correlations_size,
    skip_deeplc_retrain,
    spectrum_id_pattern: str,
    psm_id_pattern: str,
    consider_modloss,
    transfer_learning,
    transfer_learning_test_ratio,
    save_retrain_model,
    epoch_to_train_ms2
):

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
quantmsrescore/annotator.py (1)

436-528: Critical: Undefined variable ms2pip_generator in fallback path when ms2_tolerance_unit != "Da".

When ms2_tolerance_unit is not "Da", ms2pip_generator is never created (lines 436-443 are skipped). However, if alphapeptdeep_best_corr <= ms2pip_best_corr (e.g., both are -1 due to correlation failures), the else branch at lines 506-522 will execute and reference ms2pip_generator at line 507, causing a NameError.

Apply this diff to guard the MS2PIP path:

             if alphapeptdeep_best_corr > ms2pip_best_corr:
                 model_to_use = alphapeptdeep_best_model  # AlphaPeptdeep only has generic model
                 if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(psm_list=psm_list,
                                                                                           psms_df=psms_df,
                                                                                           model=alphapeptdeep_best_model):
                     logger.info(
                         f"Using best model: {alphapeptdeep_best_model} with correlation: {alphapeptdeep_best_corr:.4f}")
                 else:
                     # Fallback to original model if best model doesn't validate
                     if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model):
                         logger.warning("Best model validation failed, falling back to original model")
                     else:
                         logger.error("Both best model and original model validation failed")
                         return  # Exit early since no valid model is available
 
                 # Apply the selected model
                 alphapeptdeep_generator.model = model_to_use
                 alphapeptdeep_generator.add_features(psm_list, psms_df)
                 logger.info(f"Successfully applied AlphaPeptDeep annotation using model: {model_to_use}")
                 self.ms2_generator = "AlphaPeptDeep"
 
-            else:
+            elif self._ms2_tolerance_unit == "Da":
+                # Only use MS2PIP if tolerance unit is Da (ms2pip_generator exists)
                 if ms2pip_best_model and ms2pip_generator.validate_features(psm_list=psm_list, model=ms2pip_best_model):
                     model_to_use = ms2pip_best_model
                     logger.info(f"Using best model: {model_to_use} with correlation: {ms2pip_best_corr:.4f}")
                 else:
                     # Fallback to original model if best model doesn't validate
                     if ms2pip_generator.validate_features(psm_list, model=original_model):
                         logger.warning("Best model validation failed, falling back to original model")
                     else:
                         logger.error("Both best model and original model validation failed")
                         return  # Exit early since no valid model is available
 
                 # Apply the selected model
                 ms2pip_generator.model = model_to_use
                 ms2pip_generator.add_features(psm_list)
                 logger.info(f"Successfully applied MS2PIP annotation using model: {model_to_use}")
                 self.ms2_generator = "MS2PIP"
+            else:
+                logger.error("No valid MS2 model available for the current configuration")
+                return
♻️ Duplicate comments (2)
quantmsrescore/annotator.py (2)

81-82: Docstring type mismatch: epoch_to_train_ms2 is documented as bool but is int.

The parameter type should be int to match the function signature and default value of 20.

Apply this diff:

-        epoch_to_train_ms2: bool, optional
-            Epoch to train AlphaPeptDeep MS2 model.
+        epoch_to_train_ms2: int, optional
+            Number of epochs to train AlphaPeptDeep MS2 model.
+            Defaults to 20.

494-494: Logic error: Fallback validates alphapeptdeep_best_model twice instead of original_model.

The comment indicates "falling back to original model," but the code validates alphapeptdeep_best_model again on line 494, which is the same model that just failed validation on line 487-489.

Apply this diff:

                 else:
                     # Fallback to original model if best model doesn't validate
-                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model):
+                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model):
                         logger.warning("Best model validation failed, falling back to original model")
                     else:
                         logger.error("Both best model and original model validation failed")
                         return  # Exit early since no valid model is available
🧹 Nitpick comments (1)
quantmsrescore/annotator.py (1)

104-114: Consider simplifying the MS2PIP/AlphaPeptDeep selection logic.

The nested conditionals mix feature selection with unit validation. Extracting the decision into a helper method or restructuring the logic could improve readability.

For example:

# Determine MS2 predictor based on requested features and tolerance unit
if "ms2pip" in feature_annotators:
    if ms2_tolerance_unit == "Da":
        self._ms2pip = True
    else:  # ppm
        self._ms2pip = False
        self._alphapeptdeep = True
        logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!")
else:
    self._ms2pip = False

if "alphapeptdeep" in feature_annotators:
    self._alphapeptdeep = True
elif not self._ms2pip:
    self._alphapeptdeep = False
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb8473 and d5911f4.

📒 Files selected for processing (1)
  • quantmsrescore/annotator.py (10 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
quantmsrescore/annotator.py

352-352: Do not catch blind exception: Exception

(BLE001)


353-353: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


442-442: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (3.11)
  • GitHub Check: build
  • GitHub Check: Codacy Static Code Analysis

ypriverol and others added 2 commits December 1, 2025 09:28
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (7)
quantmsrescore/alphapeptdeep.py (7)

187-198: Type mismatch: passing string where ModelManager expected.

self.model is a string (e.g., "generic"), but custom_correlate now expects a ModelManager object at line 741. This will cause a runtime error.

Either create a ModelManager instance here (similar to how AlphaPeptDeepAnnotator does it at lines 461-470), or reconsider whether AlphaPeptDeepFeatureGenerator should support the new model manager architecture.


751-751: Consider using uppercase type hints for better compatibility.

The lowercase tuple[...] and list[...] require Python 3.9+. For broader compatibility, consider using Tuple and List from the typing module (already imported at line 9), or add from __future__ import annotations at the top of the file.


798-800: Remove or document commented-out code.

These commented-out lines suggest incomplete functionality. If train/valid split is not needed, remove the comments. If it might be needed later, add a TODO comment explaining when/why it would be implemented.


1088-1090: Remove commented-out debug code.

These commented-out print statements and the redundant assignment should be removed to keep the code clean.


461-470: Extract repeated model manager initialization logic.

This pattern of checking self._peptdeep_model and initializing MS2ModelManager is repeated three times (here, lines 532-541, and lines 669-678). Extract it into a helper method to reduce duplication and improve maintainability.

Also, the if-else at lines 467-470 can be simplified to transfer_learning = self._transfer_learning.


559-559: Make model save conditional on _save_retrain_model flag.

The model is saved unconditionally, but it should only be saved when self._save_retrain_model is True.

Apply this diff:

-                    model_weights.save_ms2_model()
+                    if self._save_retrain_model:
+                        model_weights.save_ms2_model()

1106-1132: Fix incorrect return type annotation and outdated docstring.

The return type annotation says Tuple[Optional[np.ndarray], Dict[str, np.ndarray]], but the function returns a pd.DataFrame (line 1152). Also, the docstring parameters (psm, encoder, model, ion_types) don't match the actual function parameters.

Update the signature and docstring:

 def _get_targets_df_for_psm(
         mz_df: pd.DataFrame,
         spectrum: ObservedSpectrum,
         ms2_tolerance: float,
         ms2_tolerance_unit: str
-) -> Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]:
+) -> pd.DataFrame:
     """
     Get targets for a PSM from a spectrum.

     Parameters
     ----------
-    psm
-        The PSM to get targets for.
+    mz_df
+        DataFrame containing theoretical m/z values.
     spectrum
         The spectrum to get targets from.
-    encoder
-        The encoder to use for peptide and peptidoform encoding.
     ms2_tolerance
         The MS2 tolerance to use.
-    model
-        The model name.
-    ion_types
-        The ion types to use.
+    ms2_tolerance_unit
+        Unit for tolerance ("Da" or "ppm").

     Returns
     -------
-    Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]
-        A tuple containing the encoded peptidoform and the targets.
+    pd.DataFrame
+        DataFrame with matched intensities, normalized by max intensity.
     """
🧹 Nitpick comments (1)
quantmsrescore/alphapeptdeep.py (1)

828-832: Add exception chaining for better error context.

When raising an exception within an except clause, use raise ... from err to preserve the traceback chain and provide better debugging context.

Apply this diff:

-        except (TypeError, IndexError):
+        except (TypeError, IndexError) as err:
             raise exceptions.TitlePatternError(
                 f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to "
                 f"spectrum ID `{spectrum.identifier}`. "
                 " Are you sure that the regex contains a capturing group?"
-            )
+            ) from err
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5911f4 and 7cec908.

📒 Files selected for processing (1)
  • quantmsrescore/alphapeptdeep.py (17 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/alphapeptdeep.py (1)
quantmsrescore/ms2_model_manager.py (3)
  • MS2ModelManager (12-174)
  • save_ms2_model (173-174)
  • ms2_fine_tuning (42-59)
🪛 Ruff (0.14.6)
quantmsrescore/alphapeptdeep.py

828-832: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


828-832: Avoid specifying long messages outside the exception class

(TRY003)


842-842: Loop control variable psm_idx not used within loop body

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build (3.11)
  • GitHub Check: build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
.github/workflows/python-package.yml (1)

23-27: Good operational housekeeping—disk space freed early in workflow.

This step removes cached tools to free ~3–5 GB on GitHub's hosted runner before the build begins. Placement after checkout but before Python setup is appropriate.

Suggest monitoring disk usage post-merge to ensure the freed space is sufficient for the expanded test suite and dependencies introduced by this PR's MS2 transfer learning and new MS2ModelManager features. If tests still fail with "insufficient disk space" errors, consider further cleanup (e.g., removing test data or caches mid-workflow).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe3bfef and 3dac322.

📒 Files selected for processing (4)
  • .github/workflows/python-package.yml (1 hunks)
  • .github/workflows/python-publish.yml (1 hunks)
  • environment.yml (1 hunks)
  • requirements.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • .github/workflows/python-publish.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build (3.11)
  • GitHub Check: build
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (2)
environment.yml (1)

22-22: Good consistency with requirements.txt.

Both requirements.txt and environment.yml pin peptdeep to 1.4.0 using appropriate syntax for each package manager. Ensure this version is verified for compatibility with the new MS2 transfer learning features (as flagged in requirements.txt).

requirements.txt (1)

12-12: Exact pinning to peptdeep 1.4.0 is appropriate for MS2 model features.

The pinning is justified: MS2ModelManager directly uses pDeepModel and MS2 utilities (ms2_fine_tuning, train_ms2_model, grid_nce_search) that are introduced or formalized in peptdeep 1.4.0. However, remove the requirement to verify compatibility with ms2rescore==3.1.4, as ms2rescore and peptdeep are separate frameworks without a direct dependency relationship; ms2rescore 3.1.4 uses MS2PIP, DeepLC, and IM2Deep as integrated feature generators, while peptdeep is an independent MS2 prediction framework used separately in this codebase.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
quantmsrescore/annotator.py (1)

19-25: Keep class/constructor docstrings in sync with new MS2 and transfer-learning options

The runtime behavior looks good, but the docs are lagging:

  • The class docstring (Line 21) still says “using MS2PIP and DeepLC models” while the class now also supports AlphaPeptDeep.
  • ms2_tolerance_unit is a public argument to __init__ but isn’t documented, so callers don’t see the allowed values ("Da" vs "ppm").
  • transfer_learning is a public flag but isn’t described in the parameter list (only its ratio, save flag, and epochs are documented).

Updating the docstrings will make the new knobs discoverable and avoid confusion for CLI/API users.

You can patch the constructor docstring along these lines:

-    Annotator for peptide-spectrum matches (PSMs) using MS2PIP and DeepLC models.
+    Annotator for peptide-spectrum matches (PSMs) using MS2PIP, AlphaPeptDeep and DeepLC models.
@@
-        ms2_tolerance : float, optional
-            MS2 tolerance for feature generation (default: 0.05).
+        ms2_tolerance : float, optional
+            MS2 tolerance for feature generation (default: 0.05).
+        ms2_tolerance_unit : str, optional
+            Unit for MS2 tolerance, either "Da" or "ppm" (default: "Da").
@@
-        consider_modloss: bool, optional
+        consider_modloss : bool, optional
             If modloss ions are considered in the ms2 model. `modloss`
             ions are mostly useful for phospho MS2 prediction model.
             Defaults to True.
+        transfer_learning : bool, optional
+            Enable AlphaPeptDeep MS2 transfer learning (default: False).
         transfer_learning_test_ratio: float, optional
             The ratio of test data for MS2 transfer learning.
             Defaults to 0.3.
         save_retrain_model: bool, optional
             Save retrained MS2 model.
             Defaults to False.
         epoch_to_train_ms2: int, optional
             Epochs to train AlphaPeptDeep MS2 model.
             Defaults to 20.

Also applies to: 27-35, 49-83

♻️ Duplicate comments (1)
quantmsrescore/annotator.py (1)

436-455: Model-selection/fallback logic in _find_and_apply_ms2_model is still inconsistent and can misbehave

There are still a few intertwined issues in this method (some of which were pointed out in prior reviews):

  • A single original_model variable is used for both MS2PIP and AlphaPeptDeep, making fallback semantics ambiguous and, in practice, incorrect for AlphaPeptDeep.
  • The AlphaPeptDeep fallback path (Lines 489–499) re-validates alphapeptdeep_best_model instead of the original model and never updates model_to_use, so you never truly “fall back to the original model” as the comment claims.
  • ms2pip_generator is only created when self._ms2_tolerance_unit == "Da", but the MS2PIP branch (Lines 510–522) is guarded only by the alphapeptdeep_best_corr > ms2pip_best_corr comparison. With the current initialization (ms2pip_best_corr = -1), this is usually safe for "ppm", but it’s still possible in edge cases to reach that else-branch without a defined ms2pip_generator.
  • For "ppm" tolerance, MS2PIP results are not computed, but the code structure still implies a comparison with ms2pip_best_corr, which is a sentinel value rather than a real correlation.

Together, these make the logic hard to reason about and could cause incorrect fallbacks or runtime errors in corner cases. I’d recommend separating “original AlphaPeptDeep model” from “original MS2PIP model” and explicitly restricting the MS2PIP path to the Da-unit case.

A minimal refactor that addresses these points could look like:

-        # Initialize AlphaPeptDeep annotator
-        try:
-            alphapeptdeep_generator = self._create_alphapeptdeep_annotator(model="generic")
-        except Exception as e:
-            logger.error(f"Failed to initialize AlphaPeptDeep: {e}")
-            raise
-
-        # Initialize MS2PIP annotator
-        if self._ms2_tolerance_unit == "Da":
-            try:
-                ms2pip_generator = self._create_ms2pip_annotator()
-                original_model = ms2pip_generator.model
-                model_to_use = original_model
-            except Exception as e:
-                logger.error(f"Failed to initialize MS2PIP: {e}")
-                raise
-        else:
-            original_model = alphapeptdeep_generator.model
-            model_to_use = original_model
+        # Initialize AlphaPeptDeep annotator
+        try:
+            alphapeptdeep_generator = self._create_alphapeptdeep_annotator(model="generic")
+            alphapeptdeep_original_model = alphapeptdeep_generator.model
+        except Exception as e:
+            logger.error(f"Failed to initialize AlphaPeptDeep: {e}")
+            raise
+
+        # Initialize MS2PIP annotator only when Da tolerance is requested
+        if self._ms2_tolerance_unit == "Da":
+            try:
+                ms2pip_generator = self._create_ms2pip_annotator()
+                ms2pip_original_model = ms2pip_generator.model
+                model_to_use = ms2pip_original_model
+            except Exception as e:
+                logger.error(f"Failed to initialize MS2PIP: {e}")
+                raise
+        else:
+            model_to_use = alphapeptdeep_original_model
@@
-            ms2pip_best_corr = -1  # Initial MS2PIP best correlation
-            ms2pip_best_model = None
-
-            # Determine which model to use based on configuration and validation
-            if self._ms2_tolerance_unit == "Da":
-                # Save original model for reference
-                logger.info("Running MS2PIP model")
-                ms2pip_best_model, ms2pip_best_corr = ms2pip_generator._find_best_ms2pip_model(calibration_set)
-            else:
-                original_model = alphapeptdeep_best_model
+            ms2pip_best_corr = float("-inf")  # Initial MS2PIP best correlation
+            ms2pip_best_model = None
+
+            # Determine which model to use based on configuration and validation
+            if self._ms2_tolerance_unit == "Da":
+                logger.info("Running MS2PIP model")
+                ms2pip_best_model, ms2pip_best_corr = ms2pip_generator._find_best_ms2pip_model(calibration_set)
@@
-            if alphapeptdeep_best_corr > ms2pip_best_corr:
-                model_to_use = alphapeptdeep_best_model  # AlphaPeptdeep only has generic model
-                if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(psm_list=psm_list,
-                                                                                          psms_df=psms_df,
-                                                                                          model=alphapeptdeep_best_model):
+            if alphapeptdeep_best_corr > ms2pip_best_corr:
+                model_to_use = alphapeptdeep_best_model or alphapeptdeep_original_model
+                if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(
+                    psm_list=psm_list, psms_df=psms_df, model=alphapeptdeep_best_model
+                ):
                     logger.info(
                         f"Using best model: {alphapeptdeep_best_model} with correlation: {alphapeptdeep_best_corr:.4f}")
                 else:
                     # Fallback to original model if best model doesn't validate
-                    if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model):
+                    if alphapeptdeep_generator.validate_features(
+                        psm_list, psms_df, model=alphapeptdeep_original_model
+                    ):
+                        model_to_use = alphapeptdeep_original_model
                         logger.warning("Best model validation failed, falling back to original model")
@@
-            else:
+            elif self._ms2_tolerance_unit == "Da":
                 if ms2pip_best_model and ms2pip_generator.validate_features(psm_list=psm_list, model=ms2pip_best_model):
                     model_to_use = ms2pip_best_model
                     logger.info(f"Using best model: {model_to_use} with correlation: {ms2pip_best_corr:.4f}")
                 else:
                     # Fallback to original model if best model doesn't validate
-                    if ms2pip_generator.validate_features(psm_list,
-                                                          model=original_model if original_model != "generic" else "HCD2021"):
+                    if ms2pip_generator.validate_features(
+                        psm_list,
+                        model=ms2pip_original_model if ms2pip_original_model != "generic" else "HCD2021",
+                    ):
                         logger.warning("Best model validation failed, falling back to original model")
-                        model_to_use = original_model if original_model != "generic" else "HCD2021"
+                        model_to_use = ms2pip_original_model if ms2pip_original_model != "generic" else "HCD2021"

This keeps the external behavior the same in the common cases but:

  • cleanly separates “original” models per generator,
  • ensures fallback actually uses the intended original model,
  • and guarantees MS2PIP is only used (and its generator is only referenced) when the unit is Da.

Also applies to: 474-488, 489-503, 510-522

🧹 Nitpick comments (3)
quantmsrescore/annotator.py (3)

102-116: ms2_tolerance_unit gating and transfer-learning wiring look consistent

The way you:

  • gate self._ms2pip on ms2_tolerance_unit == "Da" and
  • auto-enable AlphaPeptDeep (with a warning) when ms2_tolerance_unit == "ppm" and only MS2PIP was requested, and
  • thread ms2_tolerance_unit, transfer_learning, transfer_learning_test_ratio, save_retrain_model, and epoch_to_train_ms2 through to AlphaPeptDeepAnnotator

is internally consistent and matches the intended behavior described in the PR (MS2PIP only under Da; AlphaPeptDeep under ppm). No functional issues spotted here.

If you want to harden this further, a small optional improvement would be to normalize ms2_tolerance_unit once in __init__ (e.g., lower().strip()) so that "PPM"/"da" are handled cleanly, and then compare against those normalized values.

Also applies to: 124-127, 140-143, 375-393


348-352: Logging uses private _peptdeep_model attribute; consider a public accessor

Using alphapeptdeep_generator._peptdeep_model in the log message works but reaches into a private attribute of AlphaPeptDeepAnnotator. This tightens coupling to that class’s internals and could break if its implementation changes.

If feasible, consider exposing a small public property on AlphaPeptDeepAnnotator (e.g., current_model_name) and logging that instead:

-            logger.info(
-                f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model}")
+            logger.info(
+                f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator.current_model_name}"
+            )

(Assuming you add current_model_name on the annotator side.)


359-393: _create_alphapeptdeep_annotator docstring is out of sync with the signature

The factory now accepts tolerance and tolerance_unit and always returns an AlphaPeptDeepAnnotator, but the docstring still only documents model and says it returns “AlphaPeptDeep”.

You can align the docs and add a return type hint like this:

-    def _create_alphapeptdeep_annotator(self, model: Optional[str] = None, tolerance: Optional[float] = None,
-                                        tolerance_unit: Optional[str] = None):
+    def _create_alphapeptdeep_annotator(
+        self,
+        model: Optional[str] = None,
+        tolerance: Optional[float] = None,
+        tolerance_unit: Optional[str] = None,
+    ) -> AlphaPeptDeepAnnotator:
@@
-        Parameters
-        ----------
-        model : str, optional
-            AlphaPeptDeep model name to use, defaults to generic if None.
+        Parameters
+        ----------
+        model : str, optional
+            AlphaPeptDeep model name to use, defaults to "generic" if None.
+        tolerance : float, optional
+            Override for MS2 tolerance; falls back to the annotator setting if None.
+        tolerance_unit : str, optional
+            Override for MS2 tolerance unit ("Da" or "ppm"); falls back to the annotator setting if None.
@@
-        AlphaPeptDeep
-            Configured AlphaPeptDeep annotator.
+        AlphaPeptDeepAnnotator
+            Configured AlphaPeptDeep annotator.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ca2c542 and cac6698.

📒 Files selected for processing (1)
  • quantmsrescore/annotator.py (11 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py

353-353: Do not catch blind exception: Exception

(BLE001)


354-354: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


450-450: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.11)
  • GitHub Check: build

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
quantmsrescore/alphapeptdeep.py (2)

392-435: AlphaPeptDeepAnnotator.__init__ super call is broken and drops consider_modloss.

The current super call passes args and kwargs without unpacking and includes kwargs=kwargs, which doesn’t match the base constructor and will raise a TypeError. It also doesn’t forward consider_modloss to AlphaPeptDeepFeatureGenerator.

Recommend:

-        super().__init__(
-            args,
-            model=model,
-            ms2_tolerance=ms2_tolerance,
-            ms2_tolerance_unit=ms2_tolerance_unit,
-            spectrum_path=spectrum_path,
-            spectrum_id_pattern=spectrum_id_pattern,
-            model_dir=model_dir,
-            processes=processes,
-            kwargs=kwargs,
-        )
+        super().__init__(
+            *args,
+            model=model,
+            ms2_tolerance=ms2_tolerance,
+            ms2_tolerance_unit=ms2_tolerance_unit,
+            spectrum_path=spectrum_path,
+            spectrum_id_pattern=spectrum_id_pattern,
+            model_dir=model_dir,
+            consider_modloss=consider_modloss,
+            processes=processes,
+            **kwargs,
+        )

This restores the expected *args/**kwargs semantics and keeps consider_modloss consistent between base and subclass.


1039-1082: Fix _get_targets_for_psm type annotation and docstring to match implementation.

The function now takes b/y fragment m/z arrays and returns two normalized intensity arrays, but the signature and docstring still describe a different API:

  • Return type is annotated as Tuple[Optional[np.ndarray], Dict[str, np.ndarray]], yet you actually return two np.ndarrays.
  • Docstring mentions psm, encoder, model, ion_types which are no longer parameters.

Recommend:

-def _get_targets_for_psm(
-        b_frag_mzs: np.array,
-        y_frag_mzs: np.array,
-        spectrum: ObservedSpectrum,
-        ms2_tolerance: float,
-        ms2_tolerance_unit: str
-) -> Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]:
+def _get_targets_for_psm(
+        b_frag_mzs: np.ndarray,
+        y_frag_mzs: np.ndarray,
+        spectrum: ObservedSpectrum,
+        ms2_tolerance: float,
+        ms2_tolerance_unit: str,
+) -> Tuple[np.ndarray, np.ndarray]:
@@
-    Parameters
-    ----------
-    psm
-        The PSM to get targets for.
-    spectrum
-        The spectrum to get targets from.
-    encoder
-        The encoder to use for peptide and peptidoform encoding.
-    ms2_tolerance
-        The MS2 tolerance to use.
-    model
-        The model name.
-    ion_types
-        The ion types to use.
+    Parameters
+    ----------
+    b_frag_mzs, y_frag_mzs
+        Theoretical fragment m/z values for b and y ions (1D arrays).
+    spectrum
+        Observed spectrum providing m/z and intensity arrays.
+    ms2_tolerance
+        MS2 tolerance value.
+    ms2_tolerance_unit
+        Unit of the MS2 tolerance ("Da" or "ppm").
@@
-    Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]
-        A tuple containing the encoded peptidoform and the targets.
+    Tuple[np.ndarray, np.ndarray]
+        Normalized matched intensities for b and y ions.

This will make static typing and docs accurate.

♻️ Duplicate comments (2)
quantmsrescore/alphapeptdeep.py (2)

526-546: Conditional saving of retrained model is correctly wired.

The add_features path now mirrors validate_features in using _get_model_manager_and_transfer_flag(), propagates all transfer-learning knobs, and only calls save_ms2_model() when _save_retrain_model is True, which addresses the earlier unconditional-save concern.


857-865: Prediction path after fine-tuning looks correct and now works for both branches.

Switching to predict_all(precursor_df=psms_df, ...) and then unpacking precursor_df from predictions removes the earlier precursor_df-before-assignment issue and ensures predictions run regardless of whether transfer learning happened.

🧹 Nitpick comments (2)
quantmsrescore/alphapeptdeep.py (2)

35-77: Document and align ms2_tolerance_unit semantics in the constructor docstring.

You’ve added ms2_tolerance_unit and store it on self, but the docstring still says “MS2 mass tolerance in Da” and doesn’t mention the unit parameter. Consider updating the docstring to describe both ms2_tolerance and ms2_tolerance_unit (including allowed values "Da"/"ppm") to avoid confusion for callers.


824-835: Unused psm_idx loop variable in calibration loop.

In the calibration loop you never use psm_idx, which triggers Ruff’s B007 warning and slightly obscures intent.

You can simplify to:

-            for psm_idx, psm in psms_by_specid[spectrum_id]:
+            for _, psm in psms_by_specid[spectrum_id]:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cac6698 and 1a221b9.

📒 Files selected for processing (1)
  • quantmsrescore/alphapeptdeep.py (18 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/alphapeptdeep.py (2)
quantmsrescore/openms.py (1)
  • OpenMSHelper (41-632)
quantmsrescore/ms2_model_manager.py (3)
  • MS2ModelManager (12-173)
  • save_ms2_model (172-173)
  • ms2_fine_tuning (42-59)
🪛 Ruff (0.14.7)
quantmsrescore/alphapeptdeep.py

813-817: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


813-817: Avoid specifying long messages outside the exception class

(TRY003)


824-824: Loop control variable psm_idx not used within loop body

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.11)
  • GitHub Check: build
🔇 Additional comments (6)
quantmsrescore/alphapeptdeep.py (6)

463-484: Model-manager reuse and transfer-learning flag wiring looks sound.

Using _get_model_manager_and_transfer_flag() here avoids re-downloading/re-initializing models and correctly disables further transfer learning once _peptdeep_model is set. Storing model_weights into _peptdeep_model for reuse on subsequent runs is also consistent with how ms2_fine_tune returns the manager.


655-675: Reuse of _get_model_manager_and_transfer_flag in _find_best_ms2_model is consistent.

This call path now uses the same model manager / transfer-learning selection logic as validation and feature addition, and updates _peptdeep_model with the returned manager, which will avoid redundant fine-tuning on subsequent invocations.


710-719: Helper _get_model_manager_and_transfer_flag is a good extraction.

The helper cleanly encapsulates “reuse retrained model vs. create new manager” and avoids code duplication across three call sites. Behavior of returning transfer_learning=False once _peptdeep_model is set matches the intended one-time fine-tuning design.


722-759: custom_correlate refactor to return (results, model_mgr) looks correct.

The updated signature passes the ModelManager instance through to ms2_fine_tune, threads ms2_tolerance_unit and transfer-learning parameters, and now returns both results and the (possibly) fine-tuned manager. The call sites have been updated to unpack this tuple, so this change is internally consistent.


907-909: _get_targets_for_psm call correctly propagates ms2_tolerance_unit.

The call site now passes ms2_tolerance_unit alongside ms2_tolerance, aligning it with the updated _get_targets_for_psm signature and ensuring ppm-vs-Da behavior is honored during evaluation.


1085-1128: _get_targets_df_for_psm implementation is consistent with ppm/Da logic and training needs.

Flattening mz_df, matching with per-peak tolerances derived from ms2_tolerance_unit, reshaping back to the original shape, and then normalizing by the maximum intensity is a clean way to prepare training targets for the MS2 model. This helper nicely mirrors _get_targets_for_psm and fits the fine-tuning workflow.

Comment on lines +184 to 201
model_mgr = MS2ModelManager(
mask_modloss=not self._consider_modloss,
device="cpu",
model_dir=self.model_dir
)

alphapeptdeep_results = custom_correlate(
psms=psm_list_run,
spectrum_file=str(spectrum_filename),
psms_df=psms_df,
spectrum_id_pattern=self.spectrum_id_pattern,
model=self.model,
model=model_mgr,
ms2_tolerance=self.ms2_tolerance,
ms2_tolerance_unit=self.ms2_tolerance_unit,
compute_correlations=False,
model_dir=self.model_dir,
processes=self.processes,
consider_modloss=self.consider_modloss
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix use of undefined attribute _consider_modloss in AlphaPeptDeepFeatureGenerator.add_features.

AlphaPeptDeepFeatureGenerator.__init__ sets self.consider_modloss, but add_features uses self._consider_modloss when constructing MS2ModelManager. This will raise AttributeError whenever AlphaPeptDeepFeatureGenerator.add_features is used directly.

Apply:

-                    model_mgr = MS2ModelManager(
-                        mask_modloss=not self._consider_modloss,
-                        device="cpu",
-                        model_dir=self.model_dir
-                    )
+                    model_mgr = MS2ModelManager(
+                        mask_modloss=not self.consider_modloss,
+                        device="cpu",
+                        model_dir=self.model_dir,
+                    )
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around lines 184 to 201, add_features
references a non-existent attribute self._consider_modloss when constructing
MS2ModelManager; change that to use the existing initializer attribute
self.consider_modloss so the call becomes MS2ModelManager(mask_modloss=not
self.consider_modloss, ...) ensuring consistency with the rest of the method and
avoiding AttributeError.

Comment on lines 761 to 856
def ms2_fine_tune(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model_mgr,
ms2_tolerance, ms2_tolerance_unit, processes, consider_modloss, higher_score_better,
calibration_set_size, transfer_learning, transfer_learning_test_ratio, epoch_to_train_ms2):
if consider_modloss:
frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2',
'b_modloss_z1', 'b_modloss_z2',
'y_modloss_z1', 'y_modloss_z2']
else:
model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu")
model_mgr.load_installed_models(model)
if consider_modloss:
predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2',
'b_modloss_z1', 'b_modloss_z2',
'y_modloss_z1', 'y_modloss_z2'],
process_num=processes)
else:
predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"],
frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'],
process_num=processes)
precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[
"fragment_intensity_df"], predictions["fragment_mz_df"]
frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2']

if transfer_learning:
enumerated_psm_list_copy = enumerated_psm_list.copy()
# Select only PSMs that are target and not decoys
enumerated_psm_list_copy = [
psm
for psm in enumerated_psm_list_copy
if not psm.is_decoy and psm.rank == 1
]
# Sort alphapeptdeep results by PSM score and lower score is better
enumerated_psm_list_copy.sort(key=lambda x: x.score, reverse=higher_score_better)

# Get a calibration set, the % of psms to be used for calibration is defined by calibration_set_size
calibration_set = enumerated_psm_list_copy[
: int(len(enumerated_psm_list_copy) * calibration_set_size)
]

precursor_df = psms_df[
psms_df["provenance_data"].isin([next(iter(psm.provenance_data.keys())) for psm in calibration_set])]

theoretical_mz_df = create_fragment_mz_dataframe(precursor_df, frag_types)

precursor_df = precursor_df.set_index("provenance_data")

# Compile regex for spectrum ID matching
try:
spectrum_id_regex = re.compile(spectrum_id_pattern)
except TypeError:
spectrum_id_regex = re.compile(r"(.*)")

# Organize PSMs by spectrum ID
psms_by_specid = _organize_psms_by_spectrum_id(calibration_set)

match_intensity_df = []
current_index = 0
# Process each spectrum
for spectrum in read_spectrum_file(spec_file):

# Match spectrum ID with provided regex
match = spectrum_id_regex.search(spectrum.identifier)
try:
spectrum_id = match[1]
except (TypeError, IndexError):
raise exceptions.TitlePatternError(
f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to "
f"spectrum ID `{spectrum.identifier}`. "
" Are you sure that the regex contains a capturing group?"
)

# Skip if no matching PSMs
if spectrum_id not in psms_by_specid:
continue

# Process each PSM for this spectrum
for psm_idx, psm in psms_by_specid[spectrum_id]:
row_index = next(iter(psm.provenance_data.keys()))
row = precursor_df.loc[row_index]
mz = theoretical_mz_df.iloc[row["frag_start_idx"]:row["frag_stop_idx"], ]
match_intensity = _get_targets_df_for_psm(
mz, spectrum, ms2_tolerance, ms2_tolerance_unit
)
fragment_len = match_intensity.shape[0]
precursor_df.loc[row_index, "match_start_idx"] = current_index
precursor_df.loc[row_index, "match_stop_idx"] = current_index + fragment_len
match_intensity_df.append(match_intensity)
current_index += fragment_len

match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)

psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio))
psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2

precursor_df.drop(columns=["frag_start_idx", "frag_stop_idx"], inplace=True)
precursor_df.rename(columns={
"match_start_idx": "frag_start_idx",
"match_stop_idx": "frag_stop_idx"
}, inplace=True)
precursor_df["frag_start_idx"] = precursor_df["frag_start_idx"].astype("int64")
precursor_df["frag_stop_idx"] = precursor_df["frag_stop_idx"].astype("int64")

model_mgr.ms2_fine_tuning(precursor_df,
match_intensity_df,
psm_num_to_train_ms2=psm_num_to_train_ms2,
psm_num_to_test_ms2=psm_num_to_test_ms2,
train_verbose=True,
epoch_to_train_ms2=epoch_to_train_ms2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard against empty calibration sets and unmatched spectra in ms2_fine_tune.

When transfer_learning is True, several edge cases can break:

  • If len(enumerated_psm_list_copy) * calibration_set_size < 1, calibration_set is empty, so match_intensity_df stays empty and pd.concat(match_intensity_df, ...) (Line 838) raises ValueError.
  • Even with non-empty calibration_set, if no spectra match (e.g., due to spectrum_id_pattern issues), match_intensity_df is still empty and you hit the same pd.concat failure.

This makes small or edge-case datasets crash instead of gracefully skipping fine-tuning.

Consider:

-        calibration_set = enumerated_psm_list_copy[
-                          : int(len(enumerated_psm_list_copy) * calibration_set_size)
-                          ]
-
-        precursor_df = psms_df[
-            psms_df["provenance_data"].isin([next(iter(psm.provenance_data.keys())) for psm in calibration_set])]
+        calibration_len = int(len(enumerated_psm_list_copy) * calibration_set_size)
+        calibration_set = enumerated_psm_list_copy[:calibration_len]
+
+        if not calibration_set:
+            logging.info(
+                "No PSMs available for MS2 transfer learning calibration; "
+                "skipping fine-tuning."
+            )
+        else:
+            precursor_df = psms_df[
+                psms_df["provenance_data"].isin(
+                    [next(iter(psm.provenance_data.keys())) for psm in calibration_set]
+                )
+            ]
             ...
-        match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)
+        if not match_intensity_df:
+            logging.info(
+                "No matched fragment intensities found for calibration set; "
+                "skipping MS2 fine-tuning."
+            )
+        else:
+            match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)
+            psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio))
+            psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2
+            ...
+            model_mgr.ms2_fine_tuning(
+                precursor_df,
+                match_intensity_df,
+                psm_num_to_train_ms2=psm_num_to_train_ms2,
+                psm_num_to_test_ms2=psm_num_to_test_ms2,
+                train_verbose=True,
+                epoch_to_train_ms2=epoch_to_train_ms2,
+            )

This keeps prediction behavior intact while avoiding crashes when calibration data is insufficient.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.7)

813-817: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


813-817: Avoid specifying long messages outside the exception class

(TRY003)


824-824: Loop control variable psm_idx not used within loop body

(B007)

🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around lines 761-856 the transfer_learning
branch can produce an empty calibration_set or no matched spectra so
match_intensity_df stays empty and pd.concat(...) (line ~838) will raise a
ValueError; before concatenating and before calling model_mgr.ms2_fine_tuning,
check if calibration_set is non-empty and that match_intensity_df contains at
least one DataFrame (or rows); if empty, skip fine-tuning (return or continue)
and optionally log/warn, and only compute
psm_num_to_train_ms2/psm_num_to_test_ms2 and call model_mgr.ms2_fine_tuning when
there is sufficient data (e.g., at least one training PSM), ensuring types are
safe (ints) to avoid downstream errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
quantmsrescore/annotator.py (1)

486-503: Fallback logic is broken: validating the same model twice.

This issue was flagged in a previous review and remains unresolved. The problem:

  1. Line 488 reassigns original_model = alphapeptdeep_best_model
  2. Lines 492-494 validate alphapeptdeep_best_model
  3. If validation fails, line 499 validates alphapeptdeep_best_model again (since original_model now equals alphapeptdeep_best_model)

The fallback should validate the actual original model (e.g., "generic"), not the best model again.

             else:
-                logger.info("MS2PIP model don't support ppm tolerance unit. Only consider AlphaPeptDeep model")
-                original_model = alphapeptdeep_best_model
+                logger.info("MS2PIP model doesn't support ppm tolerance unit. Only consider AlphaPeptDeep model")
+                # Keep original_model as "generic" for fallback purposes

         if alphapeptdeep_best_corr > ms2pip_best_corr:
             model_to_use = alphapeptdeep_best_model  # AlphaPeptdeep only has generic model
             if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(psm_list=psm_list,
                                                                                       psms_df=psms_df,
                                                                                       model=alphapeptdeep_best_model):
                 logger.info(
                     f"Using best model: {alphapeptdeep_best_model} with correlation: {alphapeptdeep_best_corr:.4f}")
             else:
                 # Fallback to original model if best model doesn't validate
-                if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model):
+                if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model):
                     logger.warning("Best model validation failed, falling back to original model")
+                    model_to_use = original_model
                 else:
                     logger.error("Both best model and original model validation failed")
                     return  # Exit early since no valid model is available
🧹 Nitpick comments (3)
quantmsrescore/annotator.py (3)

74-83: Missing docstring for transfer_learning parameter.

The new transfer_learning parameter is added to the constructor signature but is not documented in the docstring alongside the other new parameters.

         ions are mostly useful for phospho MS2 prediction model.
             Defaults to True.
+        transfer_learning: bool, optional
+            Enable transfer learning for AlphaPeptDeep MS2 model.
+            Defaults to False.
         transfer_learning_test_ratio: float, optional
             The ratio of test data for MS2 transfer learning.
             Defaults to 0.3.

353-355: Use logging.exception to include traceback in error logs.

When catching exceptions, logging.exception automatically includes the traceback which aids debugging.

         except Exception as e:
-            logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
+            logger.exception(f"Failed to apply AlphaPeptDeep annotation: {e}")
             return  # Indicate failure through early return

449-451: Use logging.exception for consistent error logging.

Similar to the earlier exception handler, use logging.exception to capture the full traceback.

             except Exception as e:
-                logger.error(f"Failed to initialize MS2PIP: {e}")
+                logger.exception(f"Failed to initialize MS2PIP: {e}")
                 raise
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a221b9 and f1d3e9a.

📒 Files selected for processing (1)
  • quantmsrescore/annotator.py (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/annotator.py (4)
quantmsrescore/alphapeptdeep.py (2)
  • _find_best_ms2_model (629-681)
  • validate_features (437-500)
quantmsrescore/ms2pip.py (2)
  • _find_best_ms2pip_model (413-460)
  • validate_features (245-296)
quantmsrescore/openms.py (1)
  • validate_features (390-415)
quantmsrescore/idxmlreader.py (2)
  • psms_df (102-104)
  • psms_df (107-111)
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py

353-353: Do not catch blind exception: Exception

(BLE001)


354-354: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


450-450: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build
  • GitHub Check: build (3.11)
🔇 Additional comments (4)
quantmsrescore/annotator.py (4)

105-115: LGTM!

The conditional logic correctly handles the MS2PIP/AlphaPeptDeep selection based on tolerance unit, with an appropriate warning when ppm is requested but MS2PIP doesn't support it.


359-393: LGTM!

The new tolerance_unit parameter and transfer-learning-related parameters are correctly wired through to the AlphaPeptDeepAnnotator.


350-351: LGTM!

Logging the actual _peptdeep_model object provides more meaningful information about the model state after potential transfer learning, rather than just the string name.


517-520: Defensive fallback check is acceptable but could be clearer.

The original_model != "generic" check is defensive—MS2PIP models are never "generic", so this condition should always be true in this branch. The code is functionally correct since this else branch is only reachable in Da mode where original_model comes from ms2pip_generator.model.

Copy link
Contributor

Copilot AI commented Dec 2, 2025

@ypriverol I've opened a new pull request, #42, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 5 commits December 2, 2025 15:45
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Fix uninitialized ms2pip_generator access in ppm tolerance mode
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
quantmsrescore/annotator.py (5)

27-35: Synchronize __init__ docstring with new parameters and defaults

The constructor signature now includes ms2_tolerance_unit, transfer_learning, transfer_learning_test_ratio, save_retrain_model, and epoch_to_train_ms2, but the docstring:

  • Doesn’t document ms2_tolerance_unit or transfer_learning.
  • Still uses the example "ms2pip,deeplc" while alphapeptdeep is also supported.
  • States consider_modloss “Defaults to True.” while the default value on Line 33 is False.

Consider updating the docstring so public API docs match the actual signature and behavior. For example:

-        feature_generators : str
-            Comma-separated list of feature generators (e.g., "ms2pip,deeplc").
+        feature_generators : str
+            Comma-separated list of feature generators (e.g., "ms2pip,deeplc,alphapeptdeep").
@@
-        ms2_tolerance : float, optional
-            MS2 tolerance for feature generation (default: 0.05).
+        ms2_tolerance : float, optional
+            MS2 tolerance for feature generation (default: 0.05).
+        ms2_tolerance_unit : str, optional
+            Unit of the MS2 tolerance, either "Da" or "ppm" (default: "Da").
@@
-        consider_modloss: bool, optional
-            If modloss ions are considered in the ms2 model. `modloss`
-            ions are mostly useful for phospho MS2 prediction model.
-            Defaults to True.
+        consider_modloss: bool, optional
+            If modloss ions are considered in the MS2 model. `modloss`
+            ions are mostly useful for phospho MS2 prediction models.
+            Defaults to False.
+        transfer_learning: bool, optional
+            Enable AlphaPeptDeep MS2 transfer learning on the current dataset.
+            Defaults to False.
@@
-        epoch_to_train_ms2: int, optional
-            Epochs to train AlphaPeptDeep MS2 model.
-            Defaults to 20.
+        epoch_to_train_ms2: int, optional
+            Number of epochs to train the AlphaPeptDeep MS2 model.
+            Defaults to 20.

This keeps the CLI/API help aligned with what the code actually does.

Also applies to: 71-83


105-115: Validate ms2_tolerance_unit early to avoid silent misconfiguration

The initialization logic for _ms2pip / _alphapeptdeep and _ms2_tolerance_unit assumes the unit is either "Da" or "ppm". A typo like "da" or "PPM" would:

  • Set self._ms2pip = False.
  • Only enable AlphaPeptDeep when explicitly requested (or via the "ms2pip"+"ppm" special case), otherwise silently drop MS2 features.

Consider validating ms2_tolerance_unit up front and failing fast on unsupported values:

-        self._deepLC = "deeplc" in feature_annotators
+        self._deepLC = "deeplc" in feature_annotators
+
+        valid_units = {"Da", "ppm"}
+        if ms2_tolerance_unit not in valid_units:
+            raise ValueError(
+                f"Unsupported ms2_tolerance_unit '{ms2_tolerance_unit}'. "
+                f"Expected one of {sorted(valid_units)}."
+            )

         if "ms2pip" in feature_annotators and ms2_tolerance_unit == "Da":
             self._ms2pip = True

Optional: you may also want to normalize case (e.g., .upper()) before comparison.

Also applies to: 127-127, 140-143


353-355: Tighten exception handling: avoid blind Exception and use logger.exception

In several places you catch Exception and log with logger.error:

  • Line 353 in _run_alphapeptdeep_annotation
  • Line 493 when initializing MS2PIP in _find_and_apply_ms2_model
  • Line 569 when applying MS2 annotation

Catching Exception makes it easy to hide programming errors, and logger.error loses stack traces that are very valuable when debugging model/IO issues.

At minimum, prefer logger.exception so the traceback is captured, and consider narrowing the exception types (e.g., to IO/model-specific exceptions) where feasible. For example:

-        except Exception as e:
-            logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
-            return  # Indicate failure through early return
+        except Exception:
+            logger.exception("Failed to apply AlphaPeptDeep annotation")
+            return  # Indicate failure through early return
@@
-        except Exception as e:
-            logger.error(f"Failed to initialize MS2PIP: {e}")
-            raise
+        except Exception:
+            logger.exception("Failed to initialize MS2PIP")
+            raise
@@
-        except Exception as e:
-            logger.error(f"Failed to apply MS2 annotation: {e}")
-            return  # Indicate failure through early return
+        except Exception:
+            logger.exception("Failed to apply MS2 annotation")
+            return  # Indicate failure through early return

This keeps behavior the same while improving diagnosability.

Also applies to: 493-494, 569-571


359-360: Update _create_alphapeptdeep_annotator docstring to reflect new arguments

The _create_alphapeptdeep_annotator helper now accepts tolerance, tolerance_unit, and wires several transfer-learning-related options into AlphaPeptDeepAnnotator, but the docstring still only documents the model parameter.

To avoid confusing future readers, either:

  • Expand the docstring to mention tolerance / tolerance_unit and briefly note that transfer-learning options are taken from instance state, or
  • Simplify the docstring to focus on the high-level purpose (“construct a configured AlphaPeptDeepAnnotator”) without a partial parameter list.

No behavior change needed; this is purely documentation hygiene.

Also applies to: 369-373, 377-377, 388-393


481-559: Da vs ppm control flow in _find_and_apply_ms2_model is much cleaner; a couple of minor nits

The refactored _find_and_apply_ms2_model now:

  • Always initializes alphapeptdeep_generator, and only initializes ms2pip_generator when self._ms2_tolerance_unit == "Da".
  • For "ppm", correctly skips MS2PIP, logs that only AlphaPeptDeep is considered, and delegates to _apply_alphapeptdeep_model.
  • For "Da", compares alphapeptdeep_best_corr vs ms2pip_best_corr and uses _apply_alphapeptdeep_model or the MS2PIP branch accordingly.

This addresses the earlier review concerns about uninitialized original_model/ms2pip_generator in the ppm path and mismatched fallbacks between AlphaPeptDeep and MS2PIP.

A couple of small cleanups to consider:

  1. Outdated docstring parameter

    The docstring still lists a psm_list : PSMList parameter even though _find_and_apply_ms2_model now takes only self and pulls PSMs from self._idxml_reader. You could drop that parameter section or update it to describe the method as operating on the currently loaded dataset.

  2. Redundant "generic" special-casing in MS2PIP fallback

    In the MS2PIP fallback you have:

    if ms2pip_generator.validate_features(psm_list,
                                          model=original_model if original_model != "generic" else "HCD2021"):
        ...
        model_to_use = original_model if original_model != "generic" else "HCD2021"

    Here original_model comes from ms2pip_generator.model, which is derived from self._ms2_model and will not be "generic" (that name is used on the AlphaPeptDeep side). You can simplify this to just use original_model:

  •                if ms2pip_generator.validate_features(psm_list,
    
  •                                                      model=original_model if original_model != "generic" else "HCD2021"):
    
  •                if ms2pip_generator.validate_features(psm_list, model=original_model):
    

@@

  •                    model_to_use = original_model if original_model != "generic" else "HCD2021"
    
  •                    model_to_use = original_model
    
    
    

These tweaks would further reduce confusion without altering the main behavior.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f1d3e9a and 660b96f.

📒 Files selected for processing (1)
  • quantmsrescore/annotator.py (11 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py

353-353: Do not catch blind exception: Exception

(BLE001)


354-354: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


493-493: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.11)
  • GitHub Check: build
🔇 Additional comments (1)
quantmsrescore/annotator.py (1)

425-467: AlphaPeptDeep best-model selection helper looks consistent and robust

The new _apply_alphapeptdeep_model cleanly encapsulates:

  • Validation of alphapeptdeep_best_model vs the provided original_model.
  • Fallback behavior when the best model fails validation.
  • Application of the selected model plus setting self.ms2_generator = "AlphaPeptDeep" so downstream feature renaming works correctly.

The control flow (returning False when both models fail validation) is straightforward and makes _find_and_apply_ms2_model easier to reason about. No changes requested here.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 15 comments.

Comments suppressed due to low confidence (1)

quantmsrescore/alphapeptdeep.py:424

  • The super().__init__() call has several issues: 1) Line 415 passes args instead of *args, 2) Line 423 passes kwargs=kwargs which creates a named parameter instead of unpacking with **kwargs, and 3) consider_modloss is not passed to the parent class constructor even though it's a parameter in the parent's __init__ (line 44). This means the parent class won't have the consider_modloss attribute set, which could cause AttributeError or use wrong values. Fix: Change line 415 to *args, line 423 to **kwargs, and add consider_modloss=consider_modloss to the super().init() call.
        super().__init__(
            args,
            model=model,
            ms2_tolerance=ms2_tolerance,
            ms2_tolerance_unit=ms2_tolerance_unit,
            spectrum_path=spectrum_path,
            spectrum_id_pattern=spectrum_id_pattern,
            model_dir=model_dir,
            processes=processes,
            kwargs=kwargs,
        )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 119 to 131
@click.option("--transfer_learning",
help="Enabling transfer learning for AlphaPeptDeep MS2 prediction",
is_flag=True)
@click.option("--transfer_learning_test_ratio",
help="The ratio of test data for MS2 transfer learning",
default=0.30)
@click.option("--save_retrain_model",
help="Save retrained AlphaPeptDeep MS2 model weights",
is_flag=True)
@click.option("--epoch_to_train_ms2",
help="Epoch to train AlphaPeptDeep MS2 model",
type=int,
default=20)
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Click options added for transfer learning (lines 119-131) don't follow the same naming convention as other options in this file. Other options use snake_case names (e.g., --ms2_tolerance, --calibration_set_size), but these new options could benefit from more descriptive names. For consistency and clarity, consider: --transfer_learning_enabled, --transfer_learning_epochs (instead of --epoch_to_train_ms2), and --transfer_learning_save_model (instead of --save_retrain_model).

Copilot uses AI. Check for mistakes.
Comment on lines 12 to 173
class MS2ModelManager(ModelManager):
def __init__(self,
mask_modloss: bool = False,
device: str = "gpu",
model_dir: str = None,
):
self._train_psm_logging = True

self.ms2_model: pDeepModel = pDeepModel(
mask_modloss=mask_modloss, device=device
)
self.rt_model: AlphaRTModel = AlphaRTModel(device=device)
self.ccs_model: AlphaCCSModel = AlphaCCSModel(device=device)

self.charge_model: ChargeModelForModAASeq = ChargeModelForModAASeq(
device=device
)

if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")):
self.load_external_models(ms2_model_file=os.path.join(model_dir, "ms2.pth"))
self.model_str = model_dir
else:
_download_models(MODEL_ZIP_FILE_PATH)
self.load_installed_models()
self.model_str = "generic"
self.reset_by_global_settings(reload_models=False)

def __str__(self):
return self.model_str

def ms2_fine_tuning(self, psms_df: pd.DataFrame,
match_intensity_df: pd.DataFrame,
psm_num_to_train_ms2: int = 100000000,
use_grid_nce_search: bool = False,
top_n_mods_to_train: int = 10,
psm_num_per_mod_to_train_ms2: int = 50,
psm_num_to_test_ms2: int = 0,
epoch_to_train_ms2: int = 20,
train_verbose: bool = False):

self.psm_num_to_train_ms2 = psm_num_to_train_ms2
self.use_grid_nce_search = use_grid_nce_search
self.top_n_mods_to_train = top_n_mods_to_train
self.psm_num_per_mod_to_train_ms2 = psm_num_per_mod_to_train_ms2
self.psm_num_to_test_ms2 = psm_num_to_test_ms2
self.train_verbose = train_verbose
self.epoch_to_train_ms2 = epoch_to_train_ms2
self.train_ms2_model(psms_df, match_intensity_df)

def train_ms2_model(
self,
psm_df: pd.DataFrame,
matched_intensity_df: pd.DataFrame,
):
"""
Using matched_intensity_df to train/fine-tune the ms2 model.

1. It will sample `n=self.psm_num_to_train_ms2` PSMs into training dataframe (`tr_df`) for fine-tuning.
2. This method will also consider some important PTMs (`n=self.top_n_mods_to_train`) into `tr_df` for fine-tuning.
3. If `self.use_grid_nce_search==True`, this method will call `self.ms2_model.grid_nce_search` to find the best NCE and instrument.

Parameters
----------
psm_df : pd.DataFrame
PSM dataframe for fine-tuning

matched_intensity_df : pd.DataFrame
The matched fragment intensities for `psm_df`.
"""
if self.psm_num_to_train_ms2 > 0:
if self.psm_num_to_train_ms2 < len(psm_df):
tr_df = psm_sampling_with_important_mods(
psm_df,
self.psm_num_to_train_ms2,
self.top_n_mods_to_train,
self.psm_num_per_mod_to_train_ms2,
).copy()
else:
tr_df = psm_df
if len(tr_df) > 0:
tr_inten_df = pd.DataFrame()
for frag_type in self.ms2_model.charged_frag_types:
if frag_type in matched_intensity_df.columns:
tr_inten_df[frag_type] = matched_intensity_df[frag_type]
else:
tr_inten_df[frag_type] = 0.0

if self.use_grid_nce_search:
self.nce, self.instrument = self.ms2_model.grid_nce_search(
tr_df,
tr_inten_df,
nce_first=model_mgr_settings["transfer"]["grid_nce_first"],
nce_last=model_mgr_settings["transfer"]["grid_nce_last"],
nce_step=model_mgr_settings["transfer"]["grid_nce_step"],
search_instruments=model_mgr_settings["transfer"][
"grid_instrument"
],
)
tr_df["nce"] = self.nce
tr_df["instrument"] = self.instrument
else:
self.set_default_nce_instrument(tr_df)
else:
tr_df = pd.DataFrame()

if self.psm_num_to_test_ms2 > 0:
if len(tr_df) > 0:
test_psm_df = psm_df[~psm_df.sequence.isin(set(tr_df.sequence))].copy()
if len(test_psm_df) > self.psm_num_to_test_ms2:
test_psm_df = test_psm_df.sample(n=self.psm_num_to_test_ms2)
elif len(test_psm_df) == 0:
logging.info(
"No enough PSMs for testing MS2 models, "
"please reduce the `psm_num_to_train_ms2` "
"value according to overall PSM numbers. "
)
test_psm_df = []
else:
test_psm_df = psm_df.copy()
tr_inten_df = pd.DataFrame()
for frag_type in self.ms2_model.charged_frag_types:
if frag_type in matched_intensity_df.columns:
tr_inten_df[frag_type] = matched_intensity_df[frag_type]
else:
tr_inten_df[frag_type] = 0.0
self.set_default_nce_instrument(test_psm_df)
else:
test_psm_df = pd.DataFrame()

if len(test_psm_df) > 0:
logging.info(
"Testing pretrained MS2 model on testing df:\n"
+ str(self.ms2_model.test(test_psm_df, tr_inten_df))
)
if len(tr_df) > 0:
if self._train_psm_logging:
logging.info(
f"{len(tr_df)} PSMs for MS2 model training/transfer learning"
)
self.ms2_model.train(
tr_df,
fragment_intensity_df=tr_inten_df,
batch_size=self.batch_size_to_train_ms2,
epoch=self.epoch_to_train_ms2,
warmup_epoch=self.warmup_epoch_to_train_ms2,
lr=self.lr_to_train_ms2,
verbose=self.train_verbose,
)
logging.info(
"Testing refined MS2 model on training df:\n"
+ str(self.ms2_model.test(tr_df, tr_inten_df))
)
if len(test_psm_df) > 0:
logging.info(
"Testing refined MS2 model on testing df:\n"
+ str(self.ms2_model.test(test_psm_df, tr_inten_df))
)

self.model_str = "retrained_model"

def save_ms2_model(self):
self.ms2_model.save("retrained_ms2.pth")
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new MS2ModelManager class lacks test coverage. Given that this is a critical component for transfer learning functionality, tests should be added to verify initialization, fine-tuning behavior, model saving, and edge cases like empty dataframes or missing model files.

Copilot uses AI. Check for mistakes.
Comment on lines +134 to 158
ctx,
idxml: str,
mzml,
output: str,
log_level,
processes,
feature_generators,
force_model,
find_best_model,
only_features,
ms2_model,
ms2_model_dir,
ms2_tolerance,
ms2_tolerance_unit,
calibration_set_size,
valid_correlations_size,
skip_deeplc_retrain,
spectrum_id_pattern: str,
psm_id_pattern: str,
consider_modloss,
transfer_learning,
transfer_learning_test_ratio,
save_retrain_model,
epoch_to_train_ms2
):
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function msrescore2feature now has 18 parameters, making it difficult to maintain and understand. Consider grouping related parameters into configuration objects (e.g., TransferLearningConfig, MS2ToleranceConfig) or using a more structured approach like dataclasses to improve code organization and maintainability.

Copilot uses AI. Check for mistakes.
Comment on lines 107 to 113
else:
self._ms2pip = False
if "alphapeptdeep" in feature_annotators:
self._alphapeptdeep = True
elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm":
self._alphapeptdeep = True
logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!")
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here silently overrides the user's feature generator choice when they specify "ms2pip" with ppm tolerance. If a user explicitly requests "ms2pip" but provides ppm tolerance, the code enables AlphaPeptDeep instead with only a warning. This could be confusing. Consider either: 1) raising an error to force the user to fix their configuration, or 2) automatically adjusting the feature_generators string to reflect what's actually being used, so downstream logic and logging are consistent.

Suggested change
else:
self._ms2pip = False
if "alphapeptdeep" in feature_annotators:
self._alphapeptdeep = True
elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm":
self._alphapeptdeep = True
logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!")
elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm":
raise ValueError("MS2PIP only supports Da units. Please remove 'ms2pip' from feature_generators or set ms2_tolerance_unit to 'Da'.")
else:
self._ms2pip = False
if "alphapeptdeep" in feature_annotators:
self._alphapeptdeep = True

Copilot uses AI. Check for mistakes.
logger.info("Running MS2PIP model")
ms2pip_best_model, ms2pip_best_corr = ms2pip_generator._find_best_ms2pip_model(calibration_set)
else:
logger.info("MS2PIP model don't support ppm tolerance unit. Only consider AlphaPeptDeep model")
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect grammar: "don't" should be "doesn't" (MS2PIP model is singular)

Suggested change
logger.info("MS2PIP model don't support ppm tolerance unit. Only consider AlphaPeptDeep model")
logger.info("MS2PIP model doesn't support ppm tolerance unit. Only consider AlphaPeptDeep model")

Copilot uses AI. Check for mistakes.
try:
ms2pip_generator = self._create_ms2pip_annotator()
original_model = ms2pip_generator.model
model_to_use = original_model
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'model_to_use' is unnecessary as it is redefined before this value is used.
This assignment to 'model_to_use' is unnecessary as it is redefined before this value is used.

Suggested change
model_to_use = original_model

Copilot uses AI. Check for mistakes.
Comment on lines 491 to 497
model_to_use = original_model
except Exception as e:
logger.error(f"Failed to initialize MS2PIP: {e}")
raise
else:
original_model = alphapeptdeep_generator.model
model_to_use = original_model
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'model_to_use' is unnecessary as it is redefined before this value is used.
This assignment to 'model_to_use' is unnecessary as it is redefined before this value is used.

Suggested change
model_to_use = original_model
except Exception as e:
logger.error(f"Failed to initialize MS2PIP: {e}")
raise
else:
original_model = alphapeptdeep_generator.model
model_to_use = original_model
except Exception as e:
logger.error(f"Failed to initialize MS2PIP: {e}")
raise
else:
original_model = alphapeptdeep_generator.model

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
quantmsrescore/alphapeptdeep.py (2)

184-188: Fix AttributeError: use self.consider_modloss instead of self._consider_modloss.

AlphaPeptDeepFeatureGenerator.__init__ sets self.consider_modloss (line 82), but line 185 references self._consider_modloss. This will raise AttributeError when this code executes.

Apply this diff:

                 try:
                     model_mgr = MS2ModelManager(
-                        mask_modloss=not self._consider_modloss,
+                        mask_modloss=not self.consider_modloss,
                         device="cpu",
                         model_dir=self.model_dir
                     )

815-887: Guard against empty calibration set and unmatched spectra to prevent crashes.

When calibration_set is empty (e.g., len(enumerated_psm_list_copy) * calibration_set_size < 1) or when no spectra match, match_intensity_df remains an empty list, causing pd.concat at line 869 to raise ValueError. This crashes small or edge-case datasets instead of gracefully skipping fine-tuning.

Add guards before concatenation and fine-tuning:

         # Get a calibration set, the % of psms to be used for calibration is defined by calibration_set_size
         calibration_set = enumerated_psm_list_copy[
                           : int(len(enumerated_psm_list_copy) * calibration_set_size)
                           ]
+
+        if not calibration_set:
+            logger.info(
+                "No PSMs available for MS2 transfer learning calibration; "
+                "skipping fine-tuning and using pretrained model."
+            )
+        else:
             precursor_df = psms_df[
                 psms_df["provenance_data"].isin([next(iter(psm.provenance_data.keys())) for psm in calibration_set])]
             
             ...
             
-        match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)
-
-        psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio))
-        psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2
-
-        precursor_df.drop(columns=["frag_start_idx", "frag_stop_idx"], inplace=True)
-        precursor_df.rename(columns={
-            "match_start_idx": "frag_start_idx",
-            "match_stop_idx": "frag_stop_idx"
-        }, inplace=True)
-        precursor_df["frag_start_idx"] = precursor_df["frag_start_idx"].astype("int64")
-        precursor_df["frag_stop_idx"] = precursor_df["frag_stop_idx"].astype("int64")
-
-        model_mgr.ms2_fine_tuning(precursor_df,
-                                  match_intensity_df,
-                                  psm_num_to_train_ms2=psm_num_to_train_ms2,
-                                  psm_num_to_test_ms2=psm_num_to_test_ms2,
-                                  train_verbose=True,
-                                  epoch_to_train_ms2=epoch_to_train_ms2)
+            if not match_intensity_df:
+                logger.info(
+                    "No matched fragment intensities found for calibration set; "
+                    "skipping MS2 fine-tuning and using pretrained model."
+                )
+            else:
+                match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)
+
+                psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio))
+                psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2
+
+                precursor_df.drop(columns=["frag_start_idx", "frag_stop_idx"], inplace=True)
+                precursor_df.rename(columns={
+                    "match_start_idx": "frag_start_idx",
+                    "match_stop_idx": "frag_stop_idx"
+                }, inplace=True)
+                precursor_df["frag_start_idx"] = precursor_df["frag_start_idx"].astype("int64")
+                precursor_df["frag_stop_idx"] = precursor_df["frag_stop_idx"].astype("int64")
+
+                model_mgr.ms2_fine_tuning(precursor_df,
+                                          match_intensity_df,
+                                          psm_num_to_train_ms2=psm_num_to_train_ms2,
+                                          psm_num_to_test_ms2=psm_num_to_test_ms2,
+                                          train_verbose=True,
+                                          epoch_to_train_ms2=epoch_to_train_ms2)
🧹 Nitpick comments (2)
quantmsrescore/alphapeptdeep.py (2)

856-856: Replace unused loop variable with underscore.

The loop variable psm_idx is unpacked but never used in the loop body. Replace it with _ to signal it's intentionally unused.

             # Process each PSM for this spectrum
-            for psm_idx, psm in psms_by_specid[spectrum_id]:
+            for _, psm in psms_by_specid[spectrum_id]:

842-849: Improve exception handling by chaining the original error.

The exception raised in the except block doesn't chain the original error, losing context about what caused the pattern matching failure. Use raise ... from err to preserve the error chain.

             # Match spectrum ID with provided regex
             match = spectrum_id_regex.search(spectrum.identifier)
             try:
                 spectrum_id = match[1]
             except (TypeError, IndexError):
                 raise exceptions.TitlePatternError(
                     f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to "
                     f"spectrum ID `{spectrum.identifier}`. "
                     " Are you sure that the regex contains a capturing group?"
-                )
+                ) from None

Note: Using from None explicitly suppresses the exception context if you prefer, or use from e if you catch the exception as except (TypeError, IndexError) as e: to preserve the chain.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 660b96f and e22c8f5.

📒 Files selected for processing (1)
  • quantmsrescore/alphapeptdeep.py (18 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/alphapeptdeep.py (1)
quantmsrescore/ms2_model_manager.py (3)
  • MS2ModelManager (12-173)
  • save_ms2_model (172-173)
  • ms2_fine_tuning (42-59)
🪛 Ruff (0.14.7)
quantmsrescore/alphapeptdeep.py

845-849: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


845-849: Avoid specifying long messages outside the exception class

(TRY003)


856-856: Loop control variable psm_idx not used within loop body

(B007)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: build
  • GitHub Check: build (3.11)
  • GitHub Check: Codacy Static Code Analysis

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
quantmsrescore/annotator.py (1)

470-478: Tidy up _find_and_apply_ms2_model docstring and logging message

  • The docstring still documents a psm_list : PSMList parameter even though the method takes no arguments. That section should be removed or updated.
  • Minor grammar nit in the log message: logger.info("MS2PIP model don't support ppm tolerance unit...") should be "doesn't support".

These tweaks keep the public contract clear and the logs polished.

Also applies to: 519-536

🧹 Nitpick comments (4)
quantmsrescore/annotator.py (4)

27-36: init docstring is out of sync with the actual signature and defaults

The parameter docs here have drifted from the constructor:

  • ms2_model_path is documented as default "./" but the code default is "models".
  • ms2_tolerance_unit and valid_correlations_size are not documented at all.
  • transfer_learning is documented as “bool, required”, but it has a default (False) and is therefore optional.

Please update the docstring so that all parameters (including ms2_tolerance_unit and valid_correlations_size) are listed with correct types, defaults, and descriptions, and so transfer_learning is described as optional with its default.

Also applies to: 39-85


104-117: Validate and/or normalize ms2_tolerance_unit to avoid silent misconfiguration

Right now only the exact strings "Da" and "ppm" are handled; any other value (e.g. "da" or "ppm "), especially when feature_generators contains only "ms2pip", will silently disable MS2PIP and may not enable AlphaPeptDeep either, resulting in no MS2 generator being run.

Consider:

  • Normalizing the unit (e.g. ms2_tolerance_unit = ms2_tolerance_unit.strip() and enforcing a canonical case), and
  • Raising a ValueError (or at least logging a warning) if the normalized value is not one of {"Da", "ppm"}.

This makes configuration errors fail fast instead of silently dropping MS2 features.

Also applies to: 129-145


320-359: Use logger.exception (and optionally narrow the exception type) for AlphaPeptDeep failures

The broad except Exception as e: here logs only str(e) and then swallows the error with an early return. That makes debugging real programming errors (TypeError, KeyError, etc.) inside the AlphaPeptDeep path harder.

At minimum, switch to logger.exception("Failed to apply AlphaPeptDeep annotation") so the traceback is preserved, and consider narrowing the caught exception type if you mainly expect model/IO errors. The same reasoning applies to the similar catches in _find_and_apply_ms2_model.


361-375: Update _create_alphapeptdeep_annotator docstring to reflect new parameters and return type

The docstring still only documents the model parameter and says it returns AlphaPeptDeep, but the signature now includes tolerance and tolerance_unit, and the function actually returns an AlphaPeptDeepAnnotator.

Please extend the docstring to cover tolerance and tolerance_unit and adjust the return section to name AlphaPeptDeepAnnotator.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e22c8f5 and cdf733b.

📒 Files selected for processing (1)
  • quantmsrescore/annotator.py (11 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py

355-355: Do not catch blind exception: Exception

(BLE001)


356-356: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


495-495: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.11)
  • GitHub Check: build
🔇 Additional comments (1)
quantmsrescore/annotator.py (1)

427-468: Good extraction of shared AlphaPeptDeep validation/fallback logic

Factoring the “best vs original model” validation, application, and ms2_generator assignment into _apply_alphapeptdeep_model removes duplication and makes _find_and_apply_ms2_model much easier to follow. The early False return on double-validation failure is also a clear signal for the caller.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
quantmsrescore/ms2rescore.py (1)

133-157: Unused ctx and very long parameter list

  • ctx is not used, which triggers Ruff’s ARG001. If you don’t need the context, consider dropping @click.pass_context and the parameter, or simply renaming it to _ctx to signal intentional non‑use and silence the linter:
-@click.pass_context
-def msrescore2feature(
-        ctx,
+@click.pass_context
+def msrescore2feature(
+        _ctx,
         idxml: str,
         mzml,
         output: str,
         log_level,
         processes,
         feature_generators,
         force_model,
         find_best_model,
         only_features,
         ms2_model,
         ms2_model_dir,
         ms2_tolerance,
         ms2_tolerance_unit,
         calibration_set_size,
         valid_correlations_size,
         skip_deeplc_retrain,
         spectrum_id_pattern: str,
         psm_id_pattern: str,
         consider_modloss,
         transfer_learning,
         transfer_learning_test_ratio,
         save_retrain_model,
         epoch_to_train_ms2
):
  • The function now takes a large number of parameters, which hurts readability and maintainability. Longer‑term, consider grouping related settings (MS2 config, transfer‑learning config, paths) into small config objects or dataclasses and passing those instead.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdf733b and 818374c.

📒 Files selected for processing (1)
  • quantmsrescore/ms2rescore.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/ms2rescore.py (2)
quantmsrescore/annotator.py (3)
  • build_idxml_data (147-193)
  • annotate (195-230)
  • write_idxml_file (232-256)
quantmsrescore/openms.py (1)
  • write_idxml_file (251-274)
🪛 Ruff (0.14.7)
quantmsrescore/ms2rescore.py

134-134: Unused function argument: ctx

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: build
  • GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (5)
quantmsrescore/ms2rescore.py (5)

76-80: Tolerance help text aligns with default value

The updated help text for --ms2_tolerance correctly reflects the 0.05 default; no functional issues here.


81-86: New ms2_tolerance_unit option is well-scoped

Adding a dedicated ms2_tolerance_unit flag with a constrained choice (Da/ppm) and a sensible default is consistent with the rest of the CLI and keeps unit handling explicit.


119-131: Transfer‑learning CLI flags are consistent with existing options

The new flags for enabling transfer learning, test ratio, model saving, and epoch count follow the existing style (boolean flags and numeric ratios) and give users clear control over MS2 fine‑tuning behavior.


219-240: Propagation of new options into FeatureAnnotator looks correct

The new arguments (ms2_tolerance_unit, consider_modloss, and all transfer‑learning controls) are passed through as keyword arguments, which keeps the call site clear and robust against parameter reordering in FeatureAnnotator.__init__. Wiring appears complete and consistent with the CLI options.


245-245: Conditional output write is reasonable

Conditionally writing the idXML only when output is provided is a sensible safeguard and matches the optional nature of the --output flag. No issues here.

Comment on lines 179 to +216
If modloss ions are considered in the ms2 model. `modloss`
ions are mostly useful for phospho MS2 prediciton model.
ions are mostly useful for phospho MS2 prediction model.
Defaults to True.
transfer_learning: bool, optional
Enabling transfer learning for AlphaPeptDeep MS2 prediction.
Defaults to False.
transfer_learning_test_ratio: float, optional
The ratio of test data for MS2 transfer learning.
Defaults to 0.3.
save_retrain_model: bool, optional
Save retrained MS2 model.
Defaults to False.
epoch_to_train_ms2: int, optional
Number of epochs to train AlphaPeptDeep MS2 model. Defaults to 20.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring: default mismatch and missing ms2_tolerance_unit entry

Two small inconsistencies in the docstring:

  1. consider_modloss

    • Docstring says “Defaults to True.”
    • The CLI flag is a simple is_flag=True option, which defaults to False.
      Please align the documented default with the actual behavior (or change the option default if True is what you intend).
  2. ms2_tolerance_unit

    • There is a CLI option, but it’s not documented in the parameter list.
    • Suggest adding something like:
-    ms2_tolerance : float
-        The tolerance for MS²PIP annotation.
+    ms2_tolerance : float
+        The tolerance for MS²PIP annotation.
+    ms2_tolerance_unit : str, optional
+        Unit for the fragment mass tolerance, e.g. "Da" or "ppm".

This keeps the CLI behavior and Python docstring in sync for users and for generated docs.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In quantmsrescore/ms2rescore.py around lines 203 to 216, the docstring defaults
and parameter list are out of sync: update the docstring to reflect the actual
default for consider_modloss (CLI uses is_flag=True so default is False unless
you change the option) by either changing the written “Defaults to True.” to
“Defaults to False.” or change the CLI option default to True to match the
docstring; and add a documented parameter entry for ms2_tolerance_unit
describing its type, allowed values (e.g., 'ppm' or 'Da'), default value, and
brief purpose so the CLI option is reflected in the docstring and generated
docs.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
quantmsrescore/ms2_model_manager.py (2)

12-37: Missing super().__init__() call in subclass constructor.

MS2ModelManager extends ModelManager but doesn't call the parent constructor. This may cause AttributeError when accessing inherited attributes like batch_size_to_train_ms2, warmup_epoch_to_train_ms2, and lr_to_train_ms2 used in train_ms2_model (lines 154-157).

Apply this diff to fix:

     def __init__(self,
                  mask_modloss: bool = False,
                  device: str = "gpu",
-                 model_dir: str = None,
+                 model_dir: str | None = None,
                  ):
+        super().__init__()
         self._train_psm_logging = True

129-136: Potential index mismatch between test_psm_df and tr_inten_df.

In this else branch, test_psm_df is copied from psm_df, but tr_inten_df is built from the full matched_intensity_df without filtering to match test_psm_df.index. When passed to self.ms2_model.test() at line 144, the indices may not align, leading to incorrect data pairing.

Apply this diff to filter matched_intensity_df to the test indices:

             else:
                 test_psm_df = psm_df.copy()
+                matched_intensity_subset = matched_intensity_df.loc[test_psm_df.index]
                 tr_inten_df = pd.DataFrame()
                 for frag_type in self.ms2_model.charged_frag_types:
-                    if frag_type in matched_intensity_df.columns:
-                        tr_inten_df[frag_type] = matched_intensity_df[frag_type]
+                    if frag_type in matched_intensity_subset.columns:
+                        tr_inten_df[frag_type] = matched_intensity_subset[frag_type]
                     else:
                         tr_inten_df[frag_type] = 0.0
quantmsrescore/ms2rescore.py (1)

204-207: Docstring default mismatch for consider_modloss.

The docstring states "Defaults to True" but the CLI option uses is_flag=True, which defaults to False. Update the docstring to reflect the actual behavior.

     consider_modloss: bool, optional
         If modloss ions are considered in the ms2 model. `modloss`
         ions are mostly useful for phospho MS2 prediction model.
-        Defaults to True.
+        Defaults to False.
🧹 Nitpick comments (4)
quantmsrescore/ms2_model_manager.py (1)

172-173: Consider making the save path configurable.

The output filename is hardcoded to "retrained_ms2.pth". For flexibility, consider accepting an optional path parameter or using self.model_str to differentiate saved models.

-    def save_ms2_model(self):
-        self.ms2_model.save("retrained_ms2.pth")
+    def save_ms2_model(self, output_path: str = "retrained_ms2.pth"):
+        self.ms2_model.save(output_path)
quantmsrescore/ms2rescore.py (1)

133-134: Unused ctx parameter.

The ctx parameter from @click.pass_context is never used in the function body. Either remove @click.pass_context decorator or document why the context is needed for future use.

If the context is not needed, remove the decorator and parameter:

-@click.pass_context
 def msrescore2feature(
-        ctx,
         idxml: str,
quantmsrescore/annotator.py (2)

355-357: Consider using logger.exception for better debugging.

Using logger.exception instead of logger.error will automatically include the stack trace, making it easier to diagnose failures.

         except Exception as e:
-            logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}")
+            logger.exception(f"Failed to apply AlphaPeptDeep annotation: {e}")
             return  # Indicate failure through early return

493-495: Consider using logger.exception for better debugging.

Same as the previous comment—using logger.exception provides stack traces for easier debugging.

             except Exception as e:
-                logger.error(f"Failed to initialize MS2PIP: {e}")
+                logger.exception(f"Failed to initialize MS2PIP: {e}")
                 raise
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 818374c and 25222d5.

📒 Files selected for processing (3)
  • quantmsrescore/annotator.py (11 hunks)
  • quantmsrescore/ms2_model_manager.py (1 hunks)
  • quantmsrescore/ms2rescore.py (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
quantmsrescore/annotator.py (3)
quantmsrescore/alphapeptdeep.py (5)
  • AlphaPeptDeepAnnotator (390-719)
  • validate_features (437-500)
  • add_features (160-211)
  • add_features (502-556)
  • _find_best_ms2_model (629-681)
quantmsrescore/openms.py (1)
  • validate_features (390-415)
quantmsrescore/ms2pip.py (3)
  • validate_features (245-296)
  • add_features (298-340)
  • _find_best_ms2pip_model (413-460)
quantmsrescore/ms2rescore.py (2)
quantmsrescore/annotator.py (3)
  • build_idxml_data (147-193)
  • annotate (195-230)
  • write_idxml_file (232-256)
quantmsrescore/openms.py (1)
  • write_idxml_file (251-274)
🪛 Ruff (0.14.7)
quantmsrescore/ms2_model_manager.py

16-16: PEP 484 prohibits implicit Optional

Convert to T | None

(RUF013)

quantmsrescore/annotator.py

110-111: Avoid specifying long messages outside the exception class

(TRY003)


355-355: Do not catch blind exception: Exception

(BLE001)


356-356: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


494-494: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

quantmsrescore/ms2rescore.py

134-134: Unused function argument: ctx

(ARG001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: build (3.11)
  • GitHub Check: build
🔇 Additional comments (6)
quantmsrescore/ms2rescore.py (2)

81-86: LGTM!

The new ms2_tolerance_unit option is well-defined with a Choice validator ensuring only valid units (Da or ppm) are accepted.


119-131: LGTM!

The transfer learning CLI options are appropriately typed and provide sensible defaults. The flags follow the existing naming conventions in the file.

quantmsrescore/annotator.py (4)

107-117: LGTM!

The tolerance unit validation correctly prevents MS2PIP from being used with ppm units, which is not supported. The clear error message helps users understand how to fix the configuration.


427-468: LGTM!

The new _validate_and_apply_alphapeptdeep_model method consolidates validation and fallback logic cleanly, returning a boolean to indicate success/failure. This reduces code duplication in the calling code.


488-497: LGTM!

The MS2PIP annotator is now correctly initialized only when ms2_tolerance_unit == "Da", preventing initialization errors when ppm tolerance is used.


376-395: LGTM!

The _create_alphapeptdeep_annotator method correctly propagates the new tolerance_unit and transfer learning parameters to the AlphaPeptDeepAnnotator constructor.

Comment on lines +75 to +76
transfer_learning : bool, required
Whether to use MS2 transfer learning. Set to True to enable transfer learning for MS2 model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Docstring incorrectly marks transfer_learning as required.

The parameter has a default value of False (line 34), so it should be documented as optional, not required.

-        transfer_learning : bool, required
-            Whether to use MS2 transfer learning. Set to True to enable transfer learning for MS2 model.
+        transfer_learning : bool, optional
+            Whether to enable transfer learning for AlphaPeptDeep MS2 model.
+            Defaults to False.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
transfer_learning : bool, required
Whether to use MS2 transfer learning. Set to True to enable transfer learning for MS2 model.
transfer_learning : bool, optional
Whether to enable transfer learning for AlphaPeptDeep MS2 model.
Defaults to False.
🤖 Prompt for AI Agents
In quantmsrescore/annotator.py around lines 75 to 76, the docstring wrongly
labels the parameter transfer_learning as "required" even though it has a
default value of False (line 34); update the docstring to mark transfer_learning
as "optional" and adjust the description to reflect that it defaults to False
and enables MS2 transfer learning when set to True.

@ypriverol ypriverol merged commit 20d9928 into main Dec 4, 2025
3 of 4 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants