Conversation
support ppm tolerance
support ms2 transfer learning
update dev
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds unit-aware MS2 tolerance (Da/ppm), a new MS2ModelManager for MS² fine‑tuning and persistence, threads transfer‑learning/save/epoch controls through AlphaPeptDeep/annotator flows, extends the CLI with related flags, pins peptdeep to 1.4.0, adds small CI cleanup steps, and bumps version 0.0.12 → 0.0.13. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI (msrescore2feature)
participant Annot as Annotator
participant AP as AlphaPeptDeep
participant MM as MS2ModelManager
participant SM as SpectrumMatcher
CLI->>Annot: init(..., ms2_tolerance_unit, transfer flags, epoch_to_train_ms2)
Annot->>AP: create/annotator(ms2_tolerance_unit, model_mgr?)
AP->>AP: custom_correlate(..., model_mgr?, ms2_tolerance_unit)
alt transfer_learning enabled
AP->>MM: ms2_fine_tuning(psms_df, match_intensity_df, ...)
MM->>SM: build matched intensity frames (respect tolerance unit)
MM->>MM: train_ms2_model(..., grid-search?, epochs)
MM-->>AP: return trained model / model_weights
AP->>SM: predict using trained model_weights
else transfer_learning disabled
AP->>SM: predict using selected existing model (MS2PIP or AlphaPeptDeep)
end
AP-->>Annot: results + model_weights
alt save_retrain_model
Annot->>MM: save_ms2_model()
end
Annot-->>CLI: annotated features returned
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||||||||||
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
quantmsrescore/annotator.py (1)
506-522: PotentialUnboundLocalError:ms2pip_generatormay be undefined when tolerance unit is not "Da".When
ms2_tolerance_unit != "Da",ms2pip_generatoris never initialized (lines 436-443), but it's referenced in this else branch (lines 507, 512, 519, 520). Ifalphapeptdeep_best_corr <= ms2pip_best_corr(e.g., both are -1 in an edge case), this code path will fail.Consider guarding this branch or ensuring AlphaPeptDeep path is always taken when unit is ppm:
else: + if self._ms2_tolerance_unit != "Da": + logger.error("MS2PIP is not available with ppm tolerance unit") + return if ms2pip_best_model and ms2pip_generator.validate_features(psm_list=psm_list, model=ms2pip_best_model):Or refactor the comparison to ensure the ppm case always selects AlphaPeptDeep.
quantmsrescore/alphapeptdeep.py (2)
1053-1082: Incorrect return type annotation and outdated docstring.The return type says
Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]but the function returnsTuple[np.ndarray, np.ndarray](two arrays for b and y intensities). The docstring also references non-existent parameters:def _get_targets_for_psm( b_frag_mzs: np.array, y_frag_mzs: np.array, spectrum: ObservedSpectrum, ms2_tolerance: float, ms2_tolerance_unit: str -) -> Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]: +) -> Tuple[np.ndarray, np.ndarray]: """ Get targets for a PSM from a spectrum. Parameters ---------- - psm - The PSM to get targets for. + b_frag_mzs + Array of b-ion fragment m/z values. + y_frag_mzs + Array of y-ion fragment m/z values. spectrum The spectrum to get targets from. - encoder - The encoder to use for peptide and peptidoform encoding. ms2_tolerance The MS2 tolerance to use. - model - The model name. - ion_types - The ion types to use. + ms2_tolerance_unit + Unit for tolerance ("Da" or "ppm"). Returns ------- - Tuple[Optional[np.ndarray], Dict[str, np.ndarray]] - A tuple containing the encoded peptidoform and the targets. + Tuple[np.ndarray, np.ndarray] + Tuple of (b_ion_intensities, y_ion_intensities), normalized by max intensity. """
559-568:save_ms2_model()call is in wrong scope for exception handling.If
save_ms2_model()fails, it will be caught by theNoMatchingSpectraFoundhandler, which would incorrectly raise aFeatureGeneratorExceptionabout spectrum matching. Move the save call outside the try block:alphapeptdeep_results, model_weights = custom_correlate( ... ) - model_weights.save_ms2_model() - except NoMatchingSpectraFound as e: raise FeatureGeneratorException( ... ) from e + if self._save_retrain_model: + model_weights.save_ms2_model() self._calculate_features(psm_list_run, alphapeptdeep_results)
🧹 Nitpick comments (14)
quantmsrescore/ms2_model_manager.py (2)
13-17: UseOptional[str]for themodel_dirparameter.Per PEP 484, parameters with
Noneas default should useOptional[T]orT | Nonetype hint.+from typing import Optional + class MS2ModelManager(ModelManager): def __init__(self, mask_modloss: bool = False, device: str = "gpu", - model_dir: str = None, + model_dir: Optional[str] = None, ):
173-174: Consider making the output filename configurable.The hardcoded filename
"retrained_ms2.pth"may cause overwrites if this method is called multiple times or from different contexts.- def save_ms2_model(self): - self.ms2_model.save("retrained_ms2.pth") + def save_ms2_model(self, filename: str = "retrained_ms2.pth"): + """Save the trained MS2 model to a file.""" + self.ms2_model.save(filename)quantmsrescore/ms2rescore.py (2)
122-124: Add explicittype=floatfortransfer_learning_test_ratio.Without an explicit type, Click may not properly validate or convert the input value.
@click.option("--transfer_learning_test_ratio", help="The ratio of test data for MS2 transfer learning", + type=float, default=0.30)
215-216: Complete the docstring forepoch_to_train_ms2.The documentation is missing the default value description.
epoch_to_train_ms2: int, optional Epoch to train AlphaPeptDeep MS2 model. + Defaults to 20.quantmsrescore/annotator.py (2)
352-354: Consider usinglogger.exception()for better error diagnostics.
logger.exception()automatically includes the stack trace, which aids debugging.except Exception as e: - logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}") + logger.exception(f"Failed to apply AlphaPeptDeep annotation: {e}") return # Indicate failure through early return
441-443: Consider usinglogger.exception()here as well for consistency.except Exception as e: - logger.error(f"Failed to initialize MS2PIP: {e}") + logger.exception(f"Failed to initialize MS2PIP: {e}") raisequantmsrescore/alphapeptdeep.py (8)
43-43: Documentation missing for newms2_tolerance_unitparameter.The docstring describes
ms2_tolerancebut doesn't document the newms2_tolerance_unitparameter. Consider adding documentation for consistency.ms2_tolerance - MS2 mass tolerance in Da. Defaults to :py:const:`0.02`. + MS2 mass tolerance value. Defaults to :py:const:`0.02`. + ms2_tolerance_unit + Unit for MS2 mass tolerance, either "Da" or "ppm". Defaults to :py:const:`Da`.Also applies to: 58-59
461-470: Duplicated model manager initialization logic.This pattern for initializing
model_mgrand determiningtransfer_learningis repeated identically invalidate_features,add_features, and_find_best_ms2_model. Consider extracting to a helper method:def _get_model_manager(self) -> Tuple[MS2ModelManager, bool]: """Get or create model manager and determine transfer learning flag.""" if self._peptdeep_model: return self._peptdeep_model, False model_mgr = MS2ModelManager( mask_modloss=not self._consider_modloss, device="cpu", model_dir=self.model_dir ) return model_mgr, self._transfer_learning
798-800: Remove commented-out code.These commented lines appear to be leftover debugging/development code. Remove them to improve readability.
- # random.shuffle(calibration_set) - # train_set = calibration_set[: len(calibration_set) // 2] - # valid_set = calibration_set[len(calibration_set)]
828-832: Useraise ... fromfor proper exception chaining.Per static analysis, exceptions raised within
exceptclauses should useraise ... fromto preserve the exception chain:except (TypeError, IndexError): - raise exceptions.TitlePatternError( + raise exceptions.TitlePatternError( f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to " f"spectrum ID `{spectrum.identifier}`. " " Are you sure that the regex contains a capturing group?" - ) + ) from None
842-842: Unused loop variablepsm_idx.The variable
psm_idxis not used within the loop body. Use underscore prefix to indicate intentionally unused:- for psm_idx, psm in psms_by_specid[spectrum_id]: + for _psm_idx, psm in psms_by_specid[spectrum_id]:
1140-1142: Remove commented-out debug code.- # print(spectrum.mz) - # print(ms2_tolerance) # 0.05 - # spec_mz_tols = np.full_like(spectrum.mz, ms2_tolerance)
1088-1090: Remove commented-out debug code.- # print(spectrum.mz) - # print(ms2_tolerance) # 0.05 - # spec_mz_tols = np.full_like(spectrum.mz, ms2_tolerance)
741-741: Consider updating type hint toMS2ModelManager.The
modelparameter is typed asModelManagerbut the function actually expects and usesMS2ModelManager(which callsms2_fine_tuningmethod). Consider updating the type annotation for better type safety:- model: ModelManager = None, + model: MS2ModelManager = None,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
pyproject.toml(1 hunks)quantmsrescore/__init__.py(1 hunks)quantmsrescore/alphapeptdeep.py(17 hunks)quantmsrescore/annotator.py(10 hunks)quantmsrescore/ms2_model_manager.py(1 hunks)quantmsrescore/ms2rescore.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
quantmsrescore/ms2rescore.py (2)
quantmsrescore/annotator.py (3)
build_idxml_data(144-190)annotate(192-227)write_idxml_file(229-253)quantmsrescore/openms.py (1)
write_idxml_file(251-274)
quantmsrescore/alphapeptdeep.py (2)
quantmsrescore/openms.py (1)
OpenMSHelper(41-632)quantmsrescore/ms2_model_manager.py (3)
MS2ModelManager(12-174)save_ms2_model(173-174)ms2_fine_tuning(42-59)
🪛 Ruff (0.14.6)
quantmsrescore/ms2_model_manager.py
16-16: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
quantmsrescore/ms2rescore.py
134-134: Unused function argument: ctx
(ARG001)
quantmsrescore/annotator.py
352-352: Do not catch blind exception: Exception
(BLE001)
353-353: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
442-442: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
quantmsrescore/alphapeptdeep.py
828-832: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
828-832: Avoid specifying long messages outside the exception class
(TRY003)
842-842: Loop control variable psm_idx not used within loop body
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Agent
- GitHub Check: build
- GitHub Check: build (3.11)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (10)
pyproject.toml (1)
6-6: LGTM!Version bump to 0.0.13 aligns with the new transfer learning and MS2 tolerance unit features introduced in this PR.
quantmsrescore/__init__.py (1)
7-7: LGTM!Version string matches the updated
pyproject.tomlversion.quantmsrescore/ms2_model_manager.py (1)
42-59: LGTM!The fine-tuning entry point correctly stores hyperparameters and delegates to
train_ms2_model.quantmsrescore/ms2rescore.py (3)
81-86: LGTM!Good use of
click.Choiceto validate tolerance unit input, with sensible default preserving backward compatibility.
133-158: LGTM!Function signature correctly includes all new parameters. The unused
ctxparameter is a common Click pattern when using@click.pass_context.
219-240: LGTM!All new parameters are correctly propagated to the
FeatureAnnotatorconstructor.quantmsrescore/annotator.py (3)
104-114: LGTM!Good handling of the MS2PIP limitation (Da-only) with automatic fallback to AlphaPeptDeep when ppm is specified, including an appropriate warning to inform the user.
349-350: LGTM!Logging the actual model used from
_peptdeep_modelprovides more accurate traceability of which model was applied.
358-392: LGTM!Method signature correctly extended with
tolerance_unitparameter, and all transfer learning parameters properly propagated to theAlphaPeptDeepAnnotator.quantmsrescore/alphapeptdeep.py (1)
875-879: Verify prediction flow after transfer learning.The
predict_allcall usespsms_df(all PSMs) for prediction after fine-tuning on the calibration subset. This appears intentional (fine-tune on best PSMs, then predict for all), but please confirm this is the expected behavior for the transfer learning workflow.
There was a problem hiding this comment.
Pull request overview
This pull request enables transfer learning functionality in AlphaPeptDeep for the quantms-rescoring workflow. The changes introduce the ability to fine-tune MS2 prediction models using experimental data, with configurable options for training epochs, test data ratios, and model saving. Additionally, the PR adds support for ppm-based MS2 tolerance units alongside the existing Da units.
Key Changes:
- Added transfer learning capabilities to AlphaPeptDeep MS2 prediction with new CLI options (
--transfer_learning,--transfer_learning_test_ratio,--save_retrain_model,--epoch_to_train_ms2) - Implemented MS2 tolerance unit selection (Da or ppm) via
--ms2_tolerance_unitparameter - Created new
MS2ModelManagerclass to manage model training and fine-tuning operations - Version bumped from 0.0.12 to 0.0.13
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| quantmsrescore/ms2rescore.py | Added CLI options for transfer learning parameters and MS2 tolerance unit; updated function signature to include new parameters |
| quantmsrescore/ms2_model_manager.py | New file implementing MS2ModelManager class for model fine-tuning and training operations |
| quantmsrescore/annotator.py | Updated FeatureAnnotator to support transfer learning and ppm tolerance; added logic to switch from MS2PIP to AlphaPeptDeep when ppm is selected |
| quantmsrescore/alphapeptdeep.py | Modified to support ModelManager architecture, added ms2_fine_tune function for transfer learning, updated custom_correlate to return model weights |
| quantmsrescore/init.py | Version bump to 0.0.13 |
| pyproject.toml | Version bump to 0.0.13 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def msrescore2feature( | ||
| ctx, | ||
| idxml: str, | ||
| mzml, | ||
| output: str, | ||
| log_level, | ||
| processes, | ||
| feature_generators, | ||
| only_features, | ||
| ms2_model_dir, | ||
| ms2_model, | ||
| force_model, | ||
| find_best_model, | ||
| ms2_tolerance, | ||
| calibration_set_size, | ||
| valid_correlations_size, | ||
| skip_deeplc_retrain, | ||
| spectrum_id_pattern: str, | ||
| psm_id_pattern: str, | ||
| consider_modloss | ||
| ctx, | ||
| idxml: str, | ||
| mzml, | ||
| output: str, | ||
| log_level, | ||
| processes, | ||
| feature_generators, | ||
| only_features, | ||
| ms2_model_dir, | ||
| ms2_model, | ||
| ms2_tolerance, | ||
| ms2_tolerance_unit, | ||
| force_model, | ||
| find_best_model, | ||
| calibration_set_size, | ||
| valid_correlations_size, | ||
| skip_deeplc_retrain, | ||
| spectrum_id_pattern: str, | ||
| psm_id_pattern: str, | ||
| consider_modloss, | ||
| transfer_learning, | ||
| transfer_learning_test_ratio, | ||
| save_retrain_model, | ||
| epoch_to_train_ms2 | ||
| ): |
There was a problem hiding this comment.
The function parameter order does not match the Click decorator order. Click passes arguments to the function in the order they are decorated (bottom-to-top in decorators, which becomes top-to-bottom in the function). The current parameter order will cause arguments to be assigned to the wrong parameters.
The correct order based on the decorators should be:
def msrescore2feature(
ctx,
idxml: str,
mzml,
output: str,
log_level,
processes,
feature_generators,
force_model,
find_best_model,
only_features,
ms2_model,
ms2_model_dir,
ms2_tolerance,
ms2_tolerance_unit,
calibration_set_size,
valid_correlations_size,
skip_deeplc_retrain,
spectrum_id_pattern: str,
psm_id_pattern: str,
consider_modloss,
transfer_learning,
transfer_learning_test_ratio,
save_retrain_model,
epoch_to_train_ms2
):Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
quantmsrescore/annotator.py (1)
436-528: Critical: Undefined variablems2pip_generatorin fallback path whenms2_tolerance_unit != "Da".When
ms2_tolerance_unitis not "Da",ms2pip_generatoris never created (lines 436-443 are skipped). However, ifalphapeptdeep_best_corr <= ms2pip_best_corr(e.g., both are -1 due to correlation failures), the else branch at lines 506-522 will execute and referencems2pip_generatorat line 507, causing aNameError.Apply this diff to guard the MS2PIP path:
if alphapeptdeep_best_corr > ms2pip_best_corr: model_to_use = alphapeptdeep_best_model # AlphaPeptdeep only has generic model if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(psm_list=psm_list, psms_df=psms_df, model=alphapeptdeep_best_model): logger.info( f"Using best model: {alphapeptdeep_best_model} with correlation: {alphapeptdeep_best_corr:.4f}") else: # Fallback to original model if best model doesn't validate if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model): logger.warning("Best model validation failed, falling back to original model") else: logger.error("Both best model and original model validation failed") return # Exit early since no valid model is available # Apply the selected model alphapeptdeep_generator.model = model_to_use alphapeptdeep_generator.add_features(psm_list, psms_df) logger.info(f"Successfully applied AlphaPeptDeep annotation using model: {model_to_use}") self.ms2_generator = "AlphaPeptDeep" - else: + elif self._ms2_tolerance_unit == "Da": + # Only use MS2PIP if tolerance unit is Da (ms2pip_generator exists) if ms2pip_best_model and ms2pip_generator.validate_features(psm_list=psm_list, model=ms2pip_best_model): model_to_use = ms2pip_best_model logger.info(f"Using best model: {model_to_use} with correlation: {ms2pip_best_corr:.4f}") else: # Fallback to original model if best model doesn't validate if ms2pip_generator.validate_features(psm_list, model=original_model): logger.warning("Best model validation failed, falling back to original model") else: logger.error("Both best model and original model validation failed") return # Exit early since no valid model is available # Apply the selected model ms2pip_generator.model = model_to_use ms2pip_generator.add_features(psm_list) logger.info(f"Successfully applied MS2PIP annotation using model: {model_to_use}") self.ms2_generator = "MS2PIP" + else: + logger.error("No valid MS2 model available for the current configuration") + return
♻️ Duplicate comments (2)
quantmsrescore/annotator.py (2)
81-82: Docstring type mismatch:epoch_to_train_ms2is documented asboolbut isint.The parameter type should be
intto match the function signature and default value of 20.Apply this diff:
- epoch_to_train_ms2: bool, optional - Epoch to train AlphaPeptDeep MS2 model. + epoch_to_train_ms2: int, optional + Number of epochs to train AlphaPeptDeep MS2 model. + Defaults to 20.
494-494: Logic error: Fallback validatesalphapeptdeep_best_modeltwice instead oforiginal_model.The comment indicates "falling back to original model," but the code validates
alphapeptdeep_best_modelagain on line 494, which is the same model that just failed validation on line 487-489.Apply this diff:
else: # Fallback to original model if best model doesn't validate - if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model): + if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model): logger.warning("Best model validation failed, falling back to original model") else: logger.error("Both best model and original model validation failed") return # Exit early since no valid model is available
🧹 Nitpick comments (1)
quantmsrescore/annotator.py (1)
104-114: Consider simplifying the MS2PIP/AlphaPeptDeep selection logic.The nested conditionals mix feature selection with unit validation. Extracting the decision into a helper method or restructuring the logic could improve readability.
For example:
# Determine MS2 predictor based on requested features and tolerance unit if "ms2pip" in feature_annotators: if ms2_tolerance_unit == "Da": self._ms2pip = True else: # ppm self._ms2pip = False self._alphapeptdeep = True logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!") else: self._ms2pip = False if "alphapeptdeep" in feature_annotators: self._alphapeptdeep = True elif not self._ms2pip: self._alphapeptdeep = False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/annotator.py(10 hunks)
🧰 Additional context used
🪛 Ruff (0.14.6)
quantmsrescore/annotator.py
352-352: Do not catch blind exception: Exception
(BLE001)
353-353: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
442-442: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (3.11)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (7)
quantmsrescore/alphapeptdeep.py (7)
187-198: Type mismatch: passing string where ModelManager expected.
self.modelis a string (e.g., "generic"), butcustom_correlatenow expects aModelManagerobject at line 741. This will cause a runtime error.Either create a
ModelManagerinstance here (similar to howAlphaPeptDeepAnnotatordoes it at lines 461-470), or reconsider whetherAlphaPeptDeepFeatureGeneratorshould support the new model manager architecture.
751-751: Consider using uppercase type hints for better compatibility.The lowercase
tuple[...]andlist[...]require Python 3.9+. For broader compatibility, consider usingTupleandListfrom thetypingmodule (already imported at line 9), or addfrom __future__ import annotationsat the top of the file.
798-800: Remove or document commented-out code.These commented-out lines suggest incomplete functionality. If train/valid split is not needed, remove the comments. If it might be needed later, add a TODO comment explaining when/why it would be implemented.
1088-1090: Remove commented-out debug code.These commented-out print statements and the redundant assignment should be removed to keep the code clean.
461-470: Extract repeated model manager initialization logic.This pattern of checking
self._peptdeep_modeland initializingMS2ModelManageris repeated three times (here, lines 532-541, and lines 669-678). Extract it into a helper method to reduce duplication and improve maintainability.Also, the if-else at lines 467-470 can be simplified to
transfer_learning = self._transfer_learning.
559-559: Make model save conditional on_save_retrain_modelflag.The model is saved unconditionally, but it should only be saved when
self._save_retrain_modelis True.Apply this diff:
- model_weights.save_ms2_model() + if self._save_retrain_model: + model_weights.save_ms2_model()
1106-1132: Fix incorrect return type annotation and outdated docstring.The return type annotation says
Tuple[Optional[np.ndarray], Dict[str, np.ndarray]], but the function returns apd.DataFrame(line 1152). Also, the docstring parameters (psm,encoder,model,ion_types) don't match the actual function parameters.Update the signature and docstring:
def _get_targets_df_for_psm( mz_df: pd.DataFrame, spectrum: ObservedSpectrum, ms2_tolerance: float, ms2_tolerance_unit: str -) -> Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]: +) -> pd.DataFrame: """ Get targets for a PSM from a spectrum. Parameters ---------- - psm - The PSM to get targets for. + mz_df + DataFrame containing theoretical m/z values. spectrum The spectrum to get targets from. - encoder - The encoder to use for peptide and peptidoform encoding. ms2_tolerance The MS2 tolerance to use. - model - The model name. - ion_types - The ion types to use. + ms2_tolerance_unit + Unit for tolerance ("Da" or "ppm"). Returns ------- - Tuple[Optional[np.ndarray], Dict[str, np.ndarray]] - A tuple containing the encoded peptidoform and the targets. + pd.DataFrame + DataFrame with matched intensities, normalized by max intensity. """
🧹 Nitpick comments (1)
quantmsrescore/alphapeptdeep.py (1)
828-832: Add exception chaining for better error context.When raising an exception within an
exceptclause, useraise ... from errto preserve the traceback chain and provide better debugging context.Apply this diff:
- except (TypeError, IndexError): + except (TypeError, IndexError) as err: raise exceptions.TitlePatternError( f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to " f"spectrum ID `{spectrum.identifier}`. " " Are you sure that the regex contains a capturing group?" - ) + ) from err
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/alphapeptdeep.py(17 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/alphapeptdeep.py (1)
quantmsrescore/ms2_model_manager.py (3)
MS2ModelManager(12-174)save_ms2_model(173-174)ms2_fine_tuning(42-59)
🪛 Ruff (0.14.6)
quantmsrescore/alphapeptdeep.py
828-832: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
828-832: Avoid specifying long messages outside the exception class
(TRY003)
842-842: Loop control variable psm_idx not used within loop body
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build (3.11)
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/python-package.yml (1)
23-27: Good operational housekeeping—disk space freed early in workflow.This step removes cached tools to free ~3–5 GB on GitHub's hosted runner before the build begins. Placement after checkout but before Python setup is appropriate.
Suggest monitoring disk usage post-merge to ensure the freed space is sufficient for the expanded test suite and dependencies introduced by this PR's MS2 transfer learning and new MS2ModelManager features. If tests still fail with "insufficient disk space" errors, consider further cleanup (e.g., removing test data or caches mid-workflow).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/python-package.yml(1 hunks).github/workflows/python-publish.yml(1 hunks)environment.yml(1 hunks)requirements.txt(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/python-publish.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build (3.11)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (2)
environment.yml (1)
22-22: Good consistency with requirements.txt.Both requirements.txt and environment.yml pin peptdeep to 1.4.0 using appropriate syntax for each package manager. Ensure this version is verified for compatibility with the new MS2 transfer learning features (as flagged in requirements.txt).
requirements.txt (1)
12-12: Exact pinning to peptdeep 1.4.0 is appropriate for MS2 model features.The pinning is justified: MS2ModelManager directly uses pDeepModel and MS2 utilities (ms2_fine_tuning, train_ms2_model, grid_nce_search) that are introduced or formalized in peptdeep 1.4.0. However, remove the requirement to verify compatibility with ms2rescore==3.1.4, as ms2rescore and peptdeep are separate frameworks without a direct dependency relationship; ms2rescore 3.1.4 uses MS2PIP, DeepLC, and IM2Deep as integrated feature generators, while peptdeep is an independent MS2 prediction framework used separately in this codebase.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
quantmsrescore/annotator.py (1)
19-25: Keep class/constructor docstrings in sync with new MS2 and transfer-learning optionsThe runtime behavior looks good, but the docs are lagging:
- The class docstring (Line 21) still says “using MS2PIP and DeepLC models” while the class now also supports AlphaPeptDeep.
ms2_tolerance_unitis a public argument to__init__but isn’t documented, so callers don’t see the allowed values ("Da"vs"ppm").transfer_learningis a public flag but isn’t described in the parameter list (only its ratio, save flag, and epochs are documented).Updating the docstrings will make the new knobs discoverable and avoid confusion for CLI/API users.
You can patch the constructor docstring along these lines:
- Annotator for peptide-spectrum matches (PSMs) using MS2PIP and DeepLC models. + Annotator for peptide-spectrum matches (PSMs) using MS2PIP, AlphaPeptDeep and DeepLC models. @@ - ms2_tolerance : float, optional - MS2 tolerance for feature generation (default: 0.05). + ms2_tolerance : float, optional + MS2 tolerance for feature generation (default: 0.05). + ms2_tolerance_unit : str, optional + Unit for MS2 tolerance, either "Da" or "ppm" (default: "Da"). @@ - consider_modloss: bool, optional + consider_modloss : bool, optional If modloss ions are considered in the ms2 model. `modloss` ions are mostly useful for phospho MS2 prediction model. Defaults to True. + transfer_learning : bool, optional + Enable AlphaPeptDeep MS2 transfer learning (default: False). transfer_learning_test_ratio: float, optional The ratio of test data for MS2 transfer learning. Defaults to 0.3. save_retrain_model: bool, optional Save retrained MS2 model. Defaults to False. epoch_to_train_ms2: int, optional Epochs to train AlphaPeptDeep MS2 model. Defaults to 20.Also applies to: 27-35, 49-83
♻️ Duplicate comments (1)
quantmsrescore/annotator.py (1)
436-455: Model-selection/fallback logic in_find_and_apply_ms2_modelis still inconsistent and can misbehaveThere are still a few intertwined issues in this method (some of which were pointed out in prior reviews):
- A single
original_modelvariable is used for both MS2PIP and AlphaPeptDeep, making fallback semantics ambiguous and, in practice, incorrect for AlphaPeptDeep.- The AlphaPeptDeep fallback path (Lines 489–499) re-validates
alphapeptdeep_best_modelinstead of the original model and never updatesmodel_to_use, so you never truly “fall back to the original model” as the comment claims.ms2pip_generatoris only created whenself._ms2_tolerance_unit == "Da", but the MS2PIP branch (Lines 510–522) is guarded only by thealphapeptdeep_best_corr > ms2pip_best_corrcomparison. With the current initialization (ms2pip_best_corr = -1), this is usually safe for"ppm", but it’s still possible in edge cases to reach that else-branch without a definedms2pip_generator.- For
"ppm"tolerance, MS2PIP results are not computed, but the code structure still implies a comparison withms2pip_best_corr, which is a sentinel value rather than a real correlation.Together, these make the logic hard to reason about and could cause incorrect fallbacks or runtime errors in corner cases. I’d recommend separating “original AlphaPeptDeep model” from “original MS2PIP model” and explicitly restricting the MS2PIP path to the Da-unit case.
A minimal refactor that addresses these points could look like:
- # Initialize AlphaPeptDeep annotator - try: - alphapeptdeep_generator = self._create_alphapeptdeep_annotator(model="generic") - except Exception as e: - logger.error(f"Failed to initialize AlphaPeptDeep: {e}") - raise - - # Initialize MS2PIP annotator - if self._ms2_tolerance_unit == "Da": - try: - ms2pip_generator = self._create_ms2pip_annotator() - original_model = ms2pip_generator.model - model_to_use = original_model - except Exception as e: - logger.error(f"Failed to initialize MS2PIP: {e}") - raise - else: - original_model = alphapeptdeep_generator.model - model_to_use = original_model + # Initialize AlphaPeptDeep annotator + try: + alphapeptdeep_generator = self._create_alphapeptdeep_annotator(model="generic") + alphapeptdeep_original_model = alphapeptdeep_generator.model + except Exception as e: + logger.error(f"Failed to initialize AlphaPeptDeep: {e}") + raise + + # Initialize MS2PIP annotator only when Da tolerance is requested + if self._ms2_tolerance_unit == "Da": + try: + ms2pip_generator = self._create_ms2pip_annotator() + ms2pip_original_model = ms2pip_generator.model + model_to_use = ms2pip_original_model + except Exception as e: + logger.error(f"Failed to initialize MS2PIP: {e}") + raise + else: + model_to_use = alphapeptdeep_original_model @@ - ms2pip_best_corr = -1 # Initial MS2PIP best correlation - ms2pip_best_model = None - - # Determine which model to use based on configuration and validation - if self._ms2_tolerance_unit == "Da": - # Save original model for reference - logger.info("Running MS2PIP model") - ms2pip_best_model, ms2pip_best_corr = ms2pip_generator._find_best_ms2pip_model(calibration_set) - else: - original_model = alphapeptdeep_best_model + ms2pip_best_corr = float("-inf") # Initial MS2PIP best correlation + ms2pip_best_model = None + + # Determine which model to use based on configuration and validation + if self._ms2_tolerance_unit == "Da": + logger.info("Running MS2PIP model") + ms2pip_best_model, ms2pip_best_corr = ms2pip_generator._find_best_ms2pip_model(calibration_set) @@ - if alphapeptdeep_best_corr > ms2pip_best_corr: - model_to_use = alphapeptdeep_best_model # AlphaPeptdeep only has generic model - if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(psm_list=psm_list, - psms_df=psms_df, - model=alphapeptdeep_best_model): + if alphapeptdeep_best_corr > ms2pip_best_corr: + model_to_use = alphapeptdeep_best_model or alphapeptdeep_original_model + if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features( + psm_list=psm_list, psms_df=psms_df, model=alphapeptdeep_best_model + ): logger.info( f"Using best model: {alphapeptdeep_best_model} with correlation: {alphapeptdeep_best_corr:.4f}") else: # Fallback to original model if best model doesn't validate - if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model): + if alphapeptdeep_generator.validate_features( + psm_list, psms_df, model=alphapeptdeep_original_model + ): + model_to_use = alphapeptdeep_original_model logger.warning("Best model validation failed, falling back to original model") @@ - else: + elif self._ms2_tolerance_unit == "Da": if ms2pip_best_model and ms2pip_generator.validate_features(psm_list=psm_list, model=ms2pip_best_model): model_to_use = ms2pip_best_model logger.info(f"Using best model: {model_to_use} with correlation: {ms2pip_best_corr:.4f}") else: # Fallback to original model if best model doesn't validate - if ms2pip_generator.validate_features(psm_list, - model=original_model if original_model != "generic" else "HCD2021"): + if ms2pip_generator.validate_features( + psm_list, + model=ms2pip_original_model if ms2pip_original_model != "generic" else "HCD2021", + ): logger.warning("Best model validation failed, falling back to original model") - model_to_use = original_model if original_model != "generic" else "HCD2021" + model_to_use = ms2pip_original_model if ms2pip_original_model != "generic" else "HCD2021"This keeps the external behavior the same in the common cases but:
- cleanly separates “original” models per generator,
- ensures fallback actually uses the intended original model,
- and guarantees MS2PIP is only used (and its generator is only referenced) when the unit is Da.
Also applies to: 474-488, 489-503, 510-522
🧹 Nitpick comments (3)
quantmsrescore/annotator.py (3)
102-116: ms2_tolerance_unit gating and transfer-learning wiring look consistentThe way you:
- gate
self._ms2piponms2_tolerance_unit == "Da"and- auto-enable
AlphaPeptDeep(with a warning) whenms2_tolerance_unit == "ppm"and only MS2PIP was requested, and- thread
ms2_tolerance_unit,transfer_learning,transfer_learning_test_ratio,save_retrain_model, andepoch_to_train_ms2through toAlphaPeptDeepAnnotatoris internally consistent and matches the intended behavior described in the PR (MS2PIP only under Da; AlphaPeptDeep under ppm). No functional issues spotted here.
If you want to harden this further, a small optional improvement would be to normalize
ms2_tolerance_unitonce in__init__(e.g.,lower().strip()) so that"PPM"/"da"are handled cleanly, and then compare against those normalized values.Also applies to: 124-127, 140-143, 375-393
348-352: Logging uses private_peptdeep_modelattribute; consider a public accessorUsing
alphapeptdeep_generator._peptdeep_modelin the log message works but reaches into a private attribute ofAlphaPeptDeepAnnotator. This tightens coupling to that class’s internals and could break if its implementation changes.If feasible, consider exposing a small public property on
AlphaPeptDeepAnnotator(e.g.,current_model_name) and logging that instead:- logger.info( - f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator._peptdeep_model}") + logger.info( + f"Successfully applied AlphaPeptDeep annotation using model: {alphapeptdeep_generator.current_model_name}" + )(Assuming you add
current_model_nameon the annotator side.)
359-393: _create_alphapeptdeep_annotator docstring is out of sync with the signatureThe factory now accepts
toleranceandtolerance_unitand always returns anAlphaPeptDeepAnnotator, but the docstring still only documentsmodeland says it returns “AlphaPeptDeep”.You can align the docs and add a return type hint like this:
- def _create_alphapeptdeep_annotator(self, model: Optional[str] = None, tolerance: Optional[float] = None, - tolerance_unit: Optional[str] = None): + def _create_alphapeptdeep_annotator( + self, + model: Optional[str] = None, + tolerance: Optional[float] = None, + tolerance_unit: Optional[str] = None, + ) -> AlphaPeptDeepAnnotator: @@ - Parameters - ---------- - model : str, optional - AlphaPeptDeep model name to use, defaults to generic if None. + Parameters + ---------- + model : str, optional + AlphaPeptDeep model name to use, defaults to "generic" if None. + tolerance : float, optional + Override for MS2 tolerance; falls back to the annotator setting if None. + tolerance_unit : str, optional + Override for MS2 tolerance unit ("Da" or "ppm"); falls back to the annotator setting if None. @@ - AlphaPeptDeep - Configured AlphaPeptDeep annotator. + AlphaPeptDeepAnnotator + Configured AlphaPeptDeep annotator.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/annotator.py(11 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py
353-353: Do not catch blind exception: Exception
(BLE001)
354-354: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
450-450: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.11)
- GitHub Check: build
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
quantmsrescore/alphapeptdeep.py (2)
392-435:AlphaPeptDeepAnnotator.__init__super call is broken and dropsconsider_modloss.The current super call passes
argsandkwargswithout unpacking and includeskwargs=kwargs, which doesn’t match the base constructor and will raise aTypeError. It also doesn’t forwardconsider_modlosstoAlphaPeptDeepFeatureGenerator.Recommend:
- super().__init__( - args, - model=model, - ms2_tolerance=ms2_tolerance, - ms2_tolerance_unit=ms2_tolerance_unit, - spectrum_path=spectrum_path, - spectrum_id_pattern=spectrum_id_pattern, - model_dir=model_dir, - processes=processes, - kwargs=kwargs, - ) + super().__init__( + *args, + model=model, + ms2_tolerance=ms2_tolerance, + ms2_tolerance_unit=ms2_tolerance_unit, + spectrum_path=spectrum_path, + spectrum_id_pattern=spectrum_id_pattern, + model_dir=model_dir, + consider_modloss=consider_modloss, + processes=processes, + **kwargs, + )This restores the expected
*args/**kwargssemantics and keepsconsider_modlossconsistent between base and subclass.
1039-1082: Fix_get_targets_for_psmtype annotation and docstring to match implementation.The function now takes b/y fragment m/z arrays and returns two normalized intensity arrays, but the signature and docstring still describe a different API:
- Return type is annotated as
Tuple[Optional[np.ndarray], Dict[str, np.ndarray]], yet you actually return twonp.ndarrays.- Docstring mentions
psm,encoder,model,ion_typeswhich are no longer parameters.Recommend:
-def _get_targets_for_psm( - b_frag_mzs: np.array, - y_frag_mzs: np.array, - spectrum: ObservedSpectrum, - ms2_tolerance: float, - ms2_tolerance_unit: str -) -> Tuple[Optional[np.ndarray], Dict[str, np.ndarray]]: +def _get_targets_for_psm( + b_frag_mzs: np.ndarray, + y_frag_mzs: np.ndarray, + spectrum: ObservedSpectrum, + ms2_tolerance: float, + ms2_tolerance_unit: str, +) -> Tuple[np.ndarray, np.ndarray]: @@ - Parameters - ---------- - psm - The PSM to get targets for. - spectrum - The spectrum to get targets from. - encoder - The encoder to use for peptide and peptidoform encoding. - ms2_tolerance - The MS2 tolerance to use. - model - The model name. - ion_types - The ion types to use. + Parameters + ---------- + b_frag_mzs, y_frag_mzs + Theoretical fragment m/z values for b and y ions (1D arrays). + spectrum + Observed spectrum providing m/z and intensity arrays. + ms2_tolerance + MS2 tolerance value. + ms2_tolerance_unit + Unit of the MS2 tolerance ("Da" or "ppm"). @@ - Tuple[Optional[np.ndarray], Dict[str, np.ndarray]] - A tuple containing the encoded peptidoform and the targets. + Tuple[np.ndarray, np.ndarray] + Normalized matched intensities for b and y ions.This will make static typing and docs accurate.
♻️ Duplicate comments (2)
quantmsrescore/alphapeptdeep.py (2)
526-546: Conditional saving of retrained model is correctly wired.The
add_featurespath now mirrorsvalidate_featuresin using_get_model_manager_and_transfer_flag(), propagates all transfer-learning knobs, and only callssave_ms2_model()when_save_retrain_modelisTrue, which addresses the earlier unconditional-save concern.
857-865: Prediction path after fine-tuning looks correct and now works for both branches.Switching to
predict_all(precursor_df=psms_df, ...)and then unpackingprecursor_dffrompredictionsremoves the earlierprecursor_df-before-assignment issue and ensures predictions run regardless of whether transfer learning happened.
🧹 Nitpick comments (2)
quantmsrescore/alphapeptdeep.py (2)
35-77: Document and alignms2_tolerance_unitsemantics in the constructor docstring.You’ve added
ms2_tolerance_unitand store it onself, but the docstring still says “MS2 mass tolerance in Da” and doesn’t mention the unit parameter. Consider updating the docstring to describe bothms2_toleranceandms2_tolerance_unit(including allowed values "Da"/"ppm") to avoid confusion for callers.
824-835: Unusedpsm_idxloop variable in calibration loop.In the calibration loop you never use
psm_idx, which triggers Ruff’s B007 warning and slightly obscures intent.You can simplify to:
- for psm_idx, psm in psms_by_specid[spectrum_id]: + for _, psm in psms_by_specid[spectrum_id]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/alphapeptdeep.py(18 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/alphapeptdeep.py (2)
quantmsrescore/openms.py (1)
OpenMSHelper(41-632)quantmsrescore/ms2_model_manager.py (3)
MS2ModelManager(12-173)save_ms2_model(172-173)ms2_fine_tuning(42-59)
🪛 Ruff (0.14.7)
quantmsrescore/alphapeptdeep.py
813-817: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
813-817: Avoid specifying long messages outside the exception class
(TRY003)
824-824: Loop control variable psm_idx not used within loop body
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.11)
- GitHub Check: build
🔇 Additional comments (6)
quantmsrescore/alphapeptdeep.py (6)
463-484: Model-manager reuse and transfer-learning flag wiring looks sound.Using
_get_model_manager_and_transfer_flag()here avoids re-downloading/re-initializing models and correctly disables further transfer learning once_peptdeep_modelis set. Storingmodel_weightsinto_peptdeep_modelfor reuse on subsequent runs is also consistent with howms2_fine_tunereturns the manager.
655-675: Reuse of_get_model_manager_and_transfer_flagin_find_best_ms2_modelis consistent.This call path now uses the same model manager / transfer-learning selection logic as validation and feature addition, and updates
_peptdeep_modelwith the returned manager, which will avoid redundant fine-tuning on subsequent invocations.
710-719: Helper_get_model_manager_and_transfer_flagis a good extraction.The helper cleanly encapsulates “reuse retrained model vs. create new manager” and avoids code duplication across three call sites. Behavior of returning
transfer_learning=Falseonce_peptdeep_modelis set matches the intended one-time fine-tuning design.
722-759:custom_correlaterefactor to return(results, model_mgr)looks correct.The updated signature passes the
ModelManagerinstance through toms2_fine_tune, threadsms2_tolerance_unitand transfer-learning parameters, and now returns both results and the (possibly) fine-tuned manager. The call sites have been updated to unpack this tuple, so this change is internally consistent.
907-909:_get_targets_for_psmcall correctly propagatesms2_tolerance_unit.The call site now passes
ms2_tolerance_unitalongsidems2_tolerance, aligning it with the updated_get_targets_for_psmsignature and ensuring ppm-vs-Da behavior is honored during evaluation.
1085-1128:_get_targets_df_for_psmimplementation is consistent with ppm/Da logic and training needs.Flattening
mz_df, matching with per-peak tolerances derived fromms2_tolerance_unit, reshaping back to the original shape, and then normalizing by the maximum intensity is a clean way to prepare training targets for the MS2 model. This helper nicely mirrors_get_targets_for_psmand fits the fine-tuning workflow.
| model_mgr = MS2ModelManager( | ||
| mask_modloss=not self._consider_modloss, | ||
| device="cpu", | ||
| model_dir=self.model_dir | ||
| ) | ||
|
|
||
| alphapeptdeep_results = custom_correlate( | ||
| psms=psm_list_run, | ||
| spectrum_file=str(spectrum_filename), | ||
| psms_df=psms_df, | ||
| spectrum_id_pattern=self.spectrum_id_pattern, | ||
| model=self.model, | ||
| model=model_mgr, | ||
| ms2_tolerance=self.ms2_tolerance, | ||
| ms2_tolerance_unit=self.ms2_tolerance_unit, | ||
| compute_correlations=False, | ||
| model_dir=self.model_dir, | ||
| processes=self.processes, | ||
| consider_modloss=self.consider_modloss | ||
| ) |
There was a problem hiding this comment.
Fix use of undefined attribute _consider_modloss in AlphaPeptDeepFeatureGenerator.add_features.
AlphaPeptDeepFeatureGenerator.__init__ sets self.consider_modloss, but add_features uses self._consider_modloss when constructing MS2ModelManager. This will raise AttributeError whenever AlphaPeptDeepFeatureGenerator.add_features is used directly.
Apply:
- model_mgr = MS2ModelManager(
- mask_modloss=not self._consider_modloss,
- device="cpu",
- model_dir=self.model_dir
- )
+ model_mgr = MS2ModelManager(
+ mask_modloss=not self.consider_modloss,
+ device="cpu",
+ model_dir=self.model_dir,
+ )🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around lines 184 to 201, add_features
references a non-existent attribute self._consider_modloss when constructing
MS2ModelManager; change that to use the existing initializer attribute
self.consider_modloss so the call becomes MS2ModelManager(mask_modloss=not
self.consider_modloss, ...) ensuring consistency with the rest of the method and
avoiding AttributeError.
| def ms2_fine_tune(enumerated_psm_list, psms_df, spec_file, spectrum_id_pattern, model_mgr, | ||
| ms2_tolerance, ms2_tolerance_unit, processes, consider_modloss, higher_score_better, | ||
| calibration_set_size, transfer_learning, transfer_learning_test_ratio, epoch_to_train_ms2): | ||
| if consider_modloss: | ||
| frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2', | ||
| 'b_modloss_z1', 'b_modloss_z2', | ||
| 'y_modloss_z1', 'y_modloss_z2'] | ||
| else: | ||
| model_mgr = ModelManager(mask_modloss=not consider_modloss, device="cpu") | ||
| model_mgr.load_installed_models(model) | ||
| if consider_modloss: | ||
| predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], | ||
| frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2', | ||
| 'b_modloss_z1', 'b_modloss_z2', | ||
| 'y_modloss_z1', 'y_modloss_z2'], | ||
| process_num=processes) | ||
| else: | ||
| predictions = model_mgr.predict_all(precursor_df=psms_df, predict_items=["ms2"], | ||
| frag_types=['b_z1', 'y_z1', 'b_z2', 'y_z2'], | ||
| process_num=processes) | ||
| precursor_df, predict_int_df, theoretical_mz_df = predictions["precursor_df"], predictions[ | ||
| "fragment_intensity_df"], predictions["fragment_mz_df"] | ||
| frag_types = ['b_z1', 'y_z1', 'b_z2', 'y_z2'] | ||
|
|
||
| if transfer_learning: | ||
| enumerated_psm_list_copy = enumerated_psm_list.copy() | ||
| # Select only PSMs that are target and not decoys | ||
| enumerated_psm_list_copy = [ | ||
| psm | ||
| for psm in enumerated_psm_list_copy | ||
| if not psm.is_decoy and psm.rank == 1 | ||
| ] | ||
| # Sort alphapeptdeep results by PSM score and lower score is better | ||
| enumerated_psm_list_copy.sort(key=lambda x: x.score, reverse=higher_score_better) | ||
|
|
||
| # Get a calibration set, the % of psms to be used for calibration is defined by calibration_set_size | ||
| calibration_set = enumerated_psm_list_copy[ | ||
| : int(len(enumerated_psm_list_copy) * calibration_set_size) | ||
| ] | ||
|
|
||
| precursor_df = psms_df[ | ||
| psms_df["provenance_data"].isin([next(iter(psm.provenance_data.keys())) for psm in calibration_set])] | ||
|
|
||
| theoretical_mz_df = create_fragment_mz_dataframe(precursor_df, frag_types) | ||
|
|
||
| precursor_df = precursor_df.set_index("provenance_data") | ||
|
|
||
| # Compile regex for spectrum ID matching | ||
| try: | ||
| spectrum_id_regex = re.compile(spectrum_id_pattern) | ||
| except TypeError: | ||
| spectrum_id_regex = re.compile(r"(.*)") | ||
|
|
||
| # Organize PSMs by spectrum ID | ||
| psms_by_specid = _organize_psms_by_spectrum_id(calibration_set) | ||
|
|
||
| match_intensity_df = [] | ||
| current_index = 0 | ||
| # Process each spectrum | ||
| for spectrum in read_spectrum_file(spec_file): | ||
|
|
||
| # Match spectrum ID with provided regex | ||
| match = spectrum_id_regex.search(spectrum.identifier) | ||
| try: | ||
| spectrum_id = match[1] | ||
| except (TypeError, IndexError): | ||
| raise exceptions.TitlePatternError( | ||
| f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to " | ||
| f"spectrum ID `{spectrum.identifier}`. " | ||
| " Are you sure that the regex contains a capturing group?" | ||
| ) | ||
|
|
||
| # Skip if no matching PSMs | ||
| if spectrum_id not in psms_by_specid: | ||
| continue | ||
|
|
||
| # Process each PSM for this spectrum | ||
| for psm_idx, psm in psms_by_specid[spectrum_id]: | ||
| row_index = next(iter(psm.provenance_data.keys())) | ||
| row = precursor_df.loc[row_index] | ||
| mz = theoretical_mz_df.iloc[row["frag_start_idx"]:row["frag_stop_idx"], ] | ||
| match_intensity = _get_targets_df_for_psm( | ||
| mz, spectrum, ms2_tolerance, ms2_tolerance_unit | ||
| ) | ||
| fragment_len = match_intensity.shape[0] | ||
| precursor_df.loc[row_index, "match_start_idx"] = current_index | ||
| precursor_df.loc[row_index, "match_stop_idx"] = current_index + fragment_len | ||
| match_intensity_df.append(match_intensity) | ||
| current_index += fragment_len | ||
|
|
||
| match_intensity_df = pd.concat(match_intensity_df, ignore_index=True) | ||
|
|
||
| psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio)) | ||
| psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2 | ||
|
|
||
| precursor_df.drop(columns=["frag_start_idx", "frag_stop_idx"], inplace=True) | ||
| precursor_df.rename(columns={ | ||
| "match_start_idx": "frag_start_idx", | ||
| "match_stop_idx": "frag_stop_idx" | ||
| }, inplace=True) | ||
| precursor_df["frag_start_idx"] = precursor_df["frag_start_idx"].astype("int64") | ||
| precursor_df["frag_stop_idx"] = precursor_df["frag_stop_idx"].astype("int64") | ||
|
|
||
| model_mgr.ms2_fine_tuning(precursor_df, | ||
| match_intensity_df, | ||
| psm_num_to_train_ms2=psm_num_to_train_ms2, | ||
| psm_num_to_test_ms2=psm_num_to_test_ms2, | ||
| train_verbose=True, | ||
| epoch_to_train_ms2=epoch_to_train_ms2) | ||
|
|
There was a problem hiding this comment.
Guard against empty calibration sets and unmatched spectra in ms2_fine_tune.
When transfer_learning is True, several edge cases can break:
- If
len(enumerated_psm_list_copy) * calibration_set_size < 1,calibration_setis empty, somatch_intensity_dfstays empty andpd.concat(match_intensity_df, ...)(Line 838) raisesValueError. - Even with non-empty
calibration_set, if no spectra match (e.g., due tospectrum_id_patternissues),match_intensity_dfis still empty and you hit the samepd.concatfailure.
This makes small or edge-case datasets crash instead of gracefully skipping fine-tuning.
Consider:
- calibration_set = enumerated_psm_list_copy[
- : int(len(enumerated_psm_list_copy) * calibration_set_size)
- ]
-
- precursor_df = psms_df[
- psms_df["provenance_data"].isin([next(iter(psm.provenance_data.keys())) for psm in calibration_set])]
+ calibration_len = int(len(enumerated_psm_list_copy) * calibration_set_size)
+ calibration_set = enumerated_psm_list_copy[:calibration_len]
+
+ if not calibration_set:
+ logging.info(
+ "No PSMs available for MS2 transfer learning calibration; "
+ "skipping fine-tuning."
+ )
+ else:
+ precursor_df = psms_df[
+ psms_df["provenance_data"].isin(
+ [next(iter(psm.provenance_data.keys())) for psm in calibration_set]
+ )
+ ]
...
- match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)
+ if not match_intensity_df:
+ logging.info(
+ "No matched fragment intensities found for calibration set; "
+ "skipping MS2 fine-tuning."
+ )
+ else:
+ match_intensity_df = pd.concat(match_intensity_df, ignore_index=True)
+ psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio))
+ psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2
+ ...
+ model_mgr.ms2_fine_tuning(
+ precursor_df,
+ match_intensity_df,
+ psm_num_to_train_ms2=psm_num_to_train_ms2,
+ psm_num_to_test_ms2=psm_num_to_test_ms2,
+ train_verbose=True,
+ epoch_to_train_ms2=epoch_to_train_ms2,
+ )This keeps prediction behavior intact while avoiding crashes when calibration data is insufficient.
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.14.7)
813-817: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
813-817: Avoid specifying long messages outside the exception class
(TRY003)
824-824: Loop control variable psm_idx not used within loop body
(B007)
🤖 Prompt for AI Agents
In quantmsrescore/alphapeptdeep.py around lines 761-856 the transfer_learning
branch can produce an empty calibration_set or no matched spectra so
match_intensity_df stays empty and pd.concat(...) (line ~838) will raise a
ValueError; before concatenating and before calling model_mgr.ms2_fine_tuning,
check if calibration_set is non-empty and that match_intensity_df contains at
least one DataFrame (or rows); if empty, skip fine-tuning (return or continue)
and optionally log/warn, and only compute
psm_num_to_train_ms2/psm_num_to_test_ms2 and call model_mgr.ms2_fine_tuning when
there is sufficient data (e.g., at least one training PSM), ensuring types are
safe (ints) to avoid downstream errors.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
quantmsrescore/annotator.py (1)
486-503: Fallback logic is broken: validating the same model twice.This issue was flagged in a previous review and remains unresolved. The problem:
- Line 488 reassigns
original_model = alphapeptdeep_best_model- Lines 492-494 validate
alphapeptdeep_best_model- If validation fails, line 499 validates
alphapeptdeep_best_modelagain (sinceoriginal_modelnow equalsalphapeptdeep_best_model)The fallback should validate the actual original model (e.g., "generic"), not the best model again.
else: - logger.info("MS2PIP model don't support ppm tolerance unit. Only consider AlphaPeptDeep model") - original_model = alphapeptdeep_best_model + logger.info("MS2PIP model doesn't support ppm tolerance unit. Only consider AlphaPeptDeep model") + # Keep original_model as "generic" for fallback purposes if alphapeptdeep_best_corr > ms2pip_best_corr: model_to_use = alphapeptdeep_best_model # AlphaPeptdeep only has generic model if alphapeptdeep_best_model and alphapeptdeep_generator.validate_features(psm_list=psm_list, psms_df=psms_df, model=alphapeptdeep_best_model): logger.info( f"Using best model: {alphapeptdeep_best_model} with correlation: {alphapeptdeep_best_corr:.4f}") else: # Fallback to original model if best model doesn't validate - if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=alphapeptdeep_best_model): + if alphapeptdeep_generator.validate_features(psm_list, psms_df, model=original_model): logger.warning("Best model validation failed, falling back to original model") + model_to_use = original_model else: logger.error("Both best model and original model validation failed") return # Exit early since no valid model is available
🧹 Nitpick comments (3)
quantmsrescore/annotator.py (3)
74-83: Missing docstring fortransfer_learningparameter.The new
transfer_learningparameter is added to the constructor signature but is not documented in the docstring alongside the other new parameters.ions are mostly useful for phospho MS2 prediction model. Defaults to True. + transfer_learning: bool, optional + Enable transfer learning for AlphaPeptDeep MS2 model. + Defaults to False. transfer_learning_test_ratio: float, optional The ratio of test data for MS2 transfer learning. Defaults to 0.3.
353-355: Uselogging.exceptionto include traceback in error logs.When catching exceptions,
logging.exceptionautomatically includes the traceback which aids debugging.except Exception as e: - logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}") + logger.exception(f"Failed to apply AlphaPeptDeep annotation: {e}") return # Indicate failure through early return
449-451: Uselogging.exceptionfor consistent error logging.Similar to the earlier exception handler, use
logging.exceptionto capture the full traceback.except Exception as e: - logger.error(f"Failed to initialize MS2PIP: {e}") + logger.exception(f"Failed to initialize MS2PIP: {e}") raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/annotator.py(11 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/annotator.py (4)
quantmsrescore/alphapeptdeep.py (2)
_find_best_ms2_model(629-681)validate_features(437-500)quantmsrescore/ms2pip.py (2)
_find_best_ms2pip_model(413-460)validate_features(245-296)quantmsrescore/openms.py (1)
validate_features(390-415)quantmsrescore/idxmlreader.py (2)
psms_df(102-104)psms_df(107-111)
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py
353-353: Do not catch blind exception: Exception
(BLE001)
354-354: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
450-450: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build
- GitHub Check: build (3.11)
🔇 Additional comments (4)
quantmsrescore/annotator.py (4)
105-115: LGTM!The conditional logic correctly handles the MS2PIP/AlphaPeptDeep selection based on tolerance unit, with an appropriate warning when ppm is requested but MS2PIP doesn't support it.
359-393: LGTM!The new tolerance_unit parameter and transfer-learning-related parameters are correctly wired through to the
AlphaPeptDeepAnnotator.
350-351: LGTM!Logging the actual
_peptdeep_modelobject provides more meaningful information about the model state after potential transfer learning, rather than just the string name.
517-520: Defensive fallback check is acceptable but could be clearer.The
original_model != "generic"check is defensive—MS2PIP models are never "generic", so this condition should always be true in this branch. The code is functionally correct since this else branch is only reachable in Da mode whereoriginal_modelcomes fromms2pip_generator.model.
|
@ypriverol I've opened a new pull request, #42, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Co-authored-by: ypriverol <52113+ypriverol@users.noreply.github.com>
Fix uninitialized ms2pip_generator access in ppm tolerance mode
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (5)
quantmsrescore/annotator.py (5)
27-35: Synchronize__init__docstring with new parameters and defaultsThe constructor signature now includes
ms2_tolerance_unit,transfer_learning,transfer_learning_test_ratio,save_retrain_model, andepoch_to_train_ms2, but the docstring:
- Doesn’t document
ms2_tolerance_unitortransfer_learning.- Still uses the example
"ms2pip,deeplc"whilealphapeptdeepis also supported.- States
consider_modloss“Defaults to True.” while the default value on Line 33 isFalse.Consider updating the docstring so public API docs match the actual signature and behavior. For example:
- feature_generators : str - Comma-separated list of feature generators (e.g., "ms2pip,deeplc"). + feature_generators : str + Comma-separated list of feature generators (e.g., "ms2pip,deeplc,alphapeptdeep"). @@ - ms2_tolerance : float, optional - MS2 tolerance for feature generation (default: 0.05). + ms2_tolerance : float, optional + MS2 tolerance for feature generation (default: 0.05). + ms2_tolerance_unit : str, optional + Unit of the MS2 tolerance, either "Da" or "ppm" (default: "Da"). @@ - consider_modloss: bool, optional - If modloss ions are considered in the ms2 model. `modloss` - ions are mostly useful for phospho MS2 prediction model. - Defaults to True. + consider_modloss: bool, optional + If modloss ions are considered in the MS2 model. `modloss` + ions are mostly useful for phospho MS2 prediction models. + Defaults to False. + transfer_learning: bool, optional + Enable AlphaPeptDeep MS2 transfer learning on the current dataset. + Defaults to False. @@ - epoch_to_train_ms2: int, optional - Epochs to train AlphaPeptDeep MS2 model. - Defaults to 20. + epoch_to_train_ms2: int, optional + Number of epochs to train the AlphaPeptDeep MS2 model. + Defaults to 20.This keeps the CLI/API help aligned with what the code actually does.
Also applies to: 71-83
105-115: Validatems2_tolerance_unitearly to avoid silent misconfigurationThe initialization logic for
_ms2pip/_alphapeptdeepand_ms2_tolerance_unitassumes the unit is either"Da"or"ppm". A typo like"da"or"PPM"would:
- Set
self._ms2pip = False.- Only enable AlphaPeptDeep when explicitly requested (or via the
"ms2pip"+"ppm"special case), otherwise silently drop MS2 features.Consider validating
ms2_tolerance_unitup front and failing fast on unsupported values:- self._deepLC = "deeplc" in feature_annotators + self._deepLC = "deeplc" in feature_annotators + + valid_units = {"Da", "ppm"} + if ms2_tolerance_unit not in valid_units: + raise ValueError( + f"Unsupported ms2_tolerance_unit '{ms2_tolerance_unit}'. " + f"Expected one of {sorted(valid_units)}." + ) if "ms2pip" in feature_annotators and ms2_tolerance_unit == "Da": self._ms2pip = TrueOptional: you may also want to normalize case (e.g.,
.upper()) before comparison.Also applies to: 127-127, 140-143
353-355: Tighten exception handling: avoid blindExceptionand uselogger.exceptionIn several places you catch
Exceptionand log withlogger.error:
- Line 353 in
_run_alphapeptdeep_annotation- Line 493 when initializing MS2PIP in
_find_and_apply_ms2_model- Line 569 when applying MS2 annotation
Catching
Exceptionmakes it easy to hide programming errors, andlogger.errorloses stack traces that are very valuable when debugging model/IO issues.At minimum, prefer
logger.exceptionso the traceback is captured, and consider narrowing the exception types (e.g., to IO/model-specific exceptions) where feasible. For example:- except Exception as e: - logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}") - return # Indicate failure through early return + except Exception: + logger.exception("Failed to apply AlphaPeptDeep annotation") + return # Indicate failure through early return @@ - except Exception as e: - logger.error(f"Failed to initialize MS2PIP: {e}") - raise + except Exception: + logger.exception("Failed to initialize MS2PIP") + raise @@ - except Exception as e: - logger.error(f"Failed to apply MS2 annotation: {e}") - return # Indicate failure through early return + except Exception: + logger.exception("Failed to apply MS2 annotation") + return # Indicate failure through early returnThis keeps behavior the same while improving diagnosability.
Also applies to: 493-494, 569-571
359-360: Update_create_alphapeptdeep_annotatordocstring to reflect new argumentsThe
_create_alphapeptdeep_annotatorhelper now acceptstolerance,tolerance_unit, and wires several transfer-learning-related options intoAlphaPeptDeepAnnotator, but the docstring still only documents themodelparameter.To avoid confusing future readers, either:
- Expand the docstring to mention
tolerance/tolerance_unitand briefly note that transfer-learning options are taken from instance state, or- Simplify the docstring to focus on the high-level purpose (“construct a configured AlphaPeptDeepAnnotator”) without a partial parameter list.
No behavior change needed; this is purely documentation hygiene.
Also applies to: 369-373, 377-377, 388-393
481-559: Da vs ppm control flow in_find_and_apply_ms2_modelis much cleaner; a couple of minor nitsThe refactored
_find_and_apply_ms2_modelnow:
- Always initializes
alphapeptdeep_generator, and only initializesms2pip_generatorwhenself._ms2_tolerance_unit == "Da".- For
"ppm", correctly skips MS2PIP, logs that only AlphaPeptDeep is considered, and delegates to_apply_alphapeptdeep_model.- For
"Da", comparesalphapeptdeep_best_corrvsms2pip_best_corrand uses_apply_alphapeptdeep_modelor the MS2PIP branch accordingly.This addresses the earlier review concerns about uninitialized
original_model/ms2pip_generatorin the ppm path and mismatched fallbacks between AlphaPeptDeep and MS2PIP.A couple of small cleanups to consider:
Outdated docstring parameter
The docstring still lists a
psm_list : PSMListparameter even though_find_and_apply_ms2_modelnow takes onlyselfand pulls PSMs fromself._idxml_reader. You could drop that parameter section or update it to describe the method as operating on the currently loaded dataset.Redundant
"generic"special-casing in MS2PIP fallbackIn the MS2PIP fallback you have:
if ms2pip_generator.validate_features(psm_list, model=original_model if original_model != "generic" else "HCD2021"): ... model_to_use = original_model if original_model != "generic" else "HCD2021"Here
original_modelcomes fromms2pip_generator.model, which is derived fromself._ms2_modeland will not be"generic"(that name is used on the AlphaPeptDeep side). You can simplify this to just useoriginal_model:
if ms2pip_generator.validate_features(psm_list,model=original_model if original_model != "generic" else "HCD2021"):
if ms2pip_generator.validate_features(psm_list, model=original_model):@@
model_to_use = original_model if original_model != "generic" else "HCD2021"
model_to_use = original_modelThese tweaks would further reduce confusion without altering the main behavior.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/annotator.py(11 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py
353-353: Do not catch blind exception: Exception
(BLE001)
354-354: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
493-493: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.11)
- GitHub Check: build
🔇 Additional comments (1)
quantmsrescore/annotator.py (1)
425-467: AlphaPeptDeep best-model selection helper looks consistent and robustThe new
_apply_alphapeptdeep_modelcleanly encapsulates:
- Validation of
alphapeptdeep_best_modelvs the providedoriginal_model.- Fallback behavior when the best model fails validation.
- Application of the selected model plus setting
self.ms2_generator = "AlphaPeptDeep"so downstream feature renaming works correctly.The control flow (returning
Falsewhen both models fail validation) is straightforward and makes_find_and_apply_ms2_modeleasier to reason about. No changes requested here.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 11 changed files in this pull request and generated 15 comments.
Comments suppressed due to low confidence (1)
quantmsrescore/alphapeptdeep.py:424
- The
super().__init__()call has several issues: 1) Line 415 passesargsinstead of*args, 2) Line 423 passeskwargs=kwargswhich creates a named parameter instead of unpacking with**kwargs, and 3)consider_modlossis not passed to the parent class constructor even though it's a parameter in the parent's__init__(line 44). This means the parent class won't have theconsider_modlossattribute set, which could cause AttributeError or use wrong values. Fix: Change line 415 to*args, line 423 to**kwargs, and addconsider_modloss=consider_modlossto the super().init() call.
super().__init__(
args,
model=model,
ms2_tolerance=ms2_tolerance,
ms2_tolerance_unit=ms2_tolerance_unit,
spectrum_path=spectrum_path,
spectrum_id_pattern=spectrum_id_pattern,
model_dir=model_dir,
processes=processes,
kwargs=kwargs,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @click.option("--transfer_learning", | ||
| help="Enabling transfer learning for AlphaPeptDeep MS2 prediction", | ||
| is_flag=True) | ||
| @click.option("--transfer_learning_test_ratio", | ||
| help="The ratio of test data for MS2 transfer learning", | ||
| default=0.30) | ||
| @click.option("--save_retrain_model", | ||
| help="Save retrained AlphaPeptDeep MS2 model weights", | ||
| is_flag=True) | ||
| @click.option("--epoch_to_train_ms2", | ||
| help="Epoch to train AlphaPeptDeep MS2 model", | ||
| type=int, | ||
| default=20) |
There was a problem hiding this comment.
The Click options added for transfer learning (lines 119-131) don't follow the same naming convention as other options in this file. Other options use snake_case names (e.g., --ms2_tolerance, --calibration_set_size), but these new options could benefit from more descriptive names. For consistency and clarity, consider: --transfer_learning_enabled, --transfer_learning_epochs (instead of --epoch_to_train_ms2), and --transfer_learning_save_model (instead of --save_retrain_model).
| class MS2ModelManager(ModelManager): | ||
| def __init__(self, | ||
| mask_modloss: bool = False, | ||
| device: str = "gpu", | ||
| model_dir: str = None, | ||
| ): | ||
| self._train_psm_logging = True | ||
|
|
||
| self.ms2_model: pDeepModel = pDeepModel( | ||
| mask_modloss=mask_modloss, device=device | ||
| ) | ||
| self.rt_model: AlphaRTModel = AlphaRTModel(device=device) | ||
| self.ccs_model: AlphaCCSModel = AlphaCCSModel(device=device) | ||
|
|
||
| self.charge_model: ChargeModelForModAASeq = ChargeModelForModAASeq( | ||
| device=device | ||
| ) | ||
|
|
||
| if model_dir is not None and os.path.exists(os.path.join(model_dir, "ms2.pth")): | ||
| self.load_external_models(ms2_model_file=os.path.join(model_dir, "ms2.pth")) | ||
| self.model_str = model_dir | ||
| else: | ||
| _download_models(MODEL_ZIP_FILE_PATH) | ||
| self.load_installed_models() | ||
| self.model_str = "generic" | ||
| self.reset_by_global_settings(reload_models=False) | ||
|
|
||
| def __str__(self): | ||
| return self.model_str | ||
|
|
||
| def ms2_fine_tuning(self, psms_df: pd.DataFrame, | ||
| match_intensity_df: pd.DataFrame, | ||
| psm_num_to_train_ms2: int = 100000000, | ||
| use_grid_nce_search: bool = False, | ||
| top_n_mods_to_train: int = 10, | ||
| psm_num_per_mod_to_train_ms2: int = 50, | ||
| psm_num_to_test_ms2: int = 0, | ||
| epoch_to_train_ms2: int = 20, | ||
| train_verbose: bool = False): | ||
|
|
||
| self.psm_num_to_train_ms2 = psm_num_to_train_ms2 | ||
| self.use_grid_nce_search = use_grid_nce_search | ||
| self.top_n_mods_to_train = top_n_mods_to_train | ||
| self.psm_num_per_mod_to_train_ms2 = psm_num_per_mod_to_train_ms2 | ||
| self.psm_num_to_test_ms2 = psm_num_to_test_ms2 | ||
| self.train_verbose = train_verbose | ||
| self.epoch_to_train_ms2 = epoch_to_train_ms2 | ||
| self.train_ms2_model(psms_df, match_intensity_df) | ||
|
|
||
| def train_ms2_model( | ||
| self, | ||
| psm_df: pd.DataFrame, | ||
| matched_intensity_df: pd.DataFrame, | ||
| ): | ||
| """ | ||
| Using matched_intensity_df to train/fine-tune the ms2 model. | ||
|
|
||
| 1. It will sample `n=self.psm_num_to_train_ms2` PSMs into training dataframe (`tr_df`) for fine-tuning. | ||
| 2. This method will also consider some important PTMs (`n=self.top_n_mods_to_train`) into `tr_df` for fine-tuning. | ||
| 3. If `self.use_grid_nce_search==True`, this method will call `self.ms2_model.grid_nce_search` to find the best NCE and instrument. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| psm_df : pd.DataFrame | ||
| PSM dataframe for fine-tuning | ||
|
|
||
| matched_intensity_df : pd.DataFrame | ||
| The matched fragment intensities for `psm_df`. | ||
| """ | ||
| if self.psm_num_to_train_ms2 > 0: | ||
| if self.psm_num_to_train_ms2 < len(psm_df): | ||
| tr_df = psm_sampling_with_important_mods( | ||
| psm_df, | ||
| self.psm_num_to_train_ms2, | ||
| self.top_n_mods_to_train, | ||
| self.psm_num_per_mod_to_train_ms2, | ||
| ).copy() | ||
| else: | ||
| tr_df = psm_df | ||
| if len(tr_df) > 0: | ||
| tr_inten_df = pd.DataFrame() | ||
| for frag_type in self.ms2_model.charged_frag_types: | ||
| if frag_type in matched_intensity_df.columns: | ||
| tr_inten_df[frag_type] = matched_intensity_df[frag_type] | ||
| else: | ||
| tr_inten_df[frag_type] = 0.0 | ||
|
|
||
| if self.use_grid_nce_search: | ||
| self.nce, self.instrument = self.ms2_model.grid_nce_search( | ||
| tr_df, | ||
| tr_inten_df, | ||
| nce_first=model_mgr_settings["transfer"]["grid_nce_first"], | ||
| nce_last=model_mgr_settings["transfer"]["grid_nce_last"], | ||
| nce_step=model_mgr_settings["transfer"]["grid_nce_step"], | ||
| search_instruments=model_mgr_settings["transfer"][ | ||
| "grid_instrument" | ||
| ], | ||
| ) | ||
| tr_df["nce"] = self.nce | ||
| tr_df["instrument"] = self.instrument | ||
| else: | ||
| self.set_default_nce_instrument(tr_df) | ||
| else: | ||
| tr_df = pd.DataFrame() | ||
|
|
||
| if self.psm_num_to_test_ms2 > 0: | ||
| if len(tr_df) > 0: | ||
| test_psm_df = psm_df[~psm_df.sequence.isin(set(tr_df.sequence))].copy() | ||
| if len(test_psm_df) > self.psm_num_to_test_ms2: | ||
| test_psm_df = test_psm_df.sample(n=self.psm_num_to_test_ms2) | ||
| elif len(test_psm_df) == 0: | ||
| logging.info( | ||
| "No enough PSMs for testing MS2 models, " | ||
| "please reduce the `psm_num_to_train_ms2` " | ||
| "value according to overall PSM numbers. " | ||
| ) | ||
| test_psm_df = [] | ||
| else: | ||
| test_psm_df = psm_df.copy() | ||
| tr_inten_df = pd.DataFrame() | ||
| for frag_type in self.ms2_model.charged_frag_types: | ||
| if frag_type in matched_intensity_df.columns: | ||
| tr_inten_df[frag_type] = matched_intensity_df[frag_type] | ||
| else: | ||
| tr_inten_df[frag_type] = 0.0 | ||
| self.set_default_nce_instrument(test_psm_df) | ||
| else: | ||
| test_psm_df = pd.DataFrame() | ||
|
|
||
| if len(test_psm_df) > 0: | ||
| logging.info( | ||
| "Testing pretrained MS2 model on testing df:\n" | ||
| + str(self.ms2_model.test(test_psm_df, tr_inten_df)) | ||
| ) | ||
| if len(tr_df) > 0: | ||
| if self._train_psm_logging: | ||
| logging.info( | ||
| f"{len(tr_df)} PSMs for MS2 model training/transfer learning" | ||
| ) | ||
| self.ms2_model.train( | ||
| tr_df, | ||
| fragment_intensity_df=tr_inten_df, | ||
| batch_size=self.batch_size_to_train_ms2, | ||
| epoch=self.epoch_to_train_ms2, | ||
| warmup_epoch=self.warmup_epoch_to_train_ms2, | ||
| lr=self.lr_to_train_ms2, | ||
| verbose=self.train_verbose, | ||
| ) | ||
| logging.info( | ||
| "Testing refined MS2 model on training df:\n" | ||
| + str(self.ms2_model.test(tr_df, tr_inten_df)) | ||
| ) | ||
| if len(test_psm_df) > 0: | ||
| logging.info( | ||
| "Testing refined MS2 model on testing df:\n" | ||
| + str(self.ms2_model.test(test_psm_df, tr_inten_df)) | ||
| ) | ||
|
|
||
| self.model_str = "retrained_model" | ||
|
|
||
| def save_ms2_model(self): | ||
| self.ms2_model.save("retrained_ms2.pth") |
There was a problem hiding this comment.
The new MS2ModelManager class lacks test coverage. Given that this is a critical component for transfer learning functionality, tests should be added to verify initialization, fine-tuning behavior, model saving, and edge cases like empty dataframes or missing model files.
| ctx, | ||
| idxml: str, | ||
| mzml, | ||
| output: str, | ||
| log_level, | ||
| processes, | ||
| feature_generators, | ||
| force_model, | ||
| find_best_model, | ||
| only_features, | ||
| ms2_model, | ||
| ms2_model_dir, | ||
| ms2_tolerance, | ||
| ms2_tolerance_unit, | ||
| calibration_set_size, | ||
| valid_correlations_size, | ||
| skip_deeplc_retrain, | ||
| spectrum_id_pattern: str, | ||
| psm_id_pattern: str, | ||
| consider_modloss, | ||
| transfer_learning, | ||
| transfer_learning_test_ratio, | ||
| save_retrain_model, | ||
| epoch_to_train_ms2 | ||
| ): |
There was a problem hiding this comment.
The function msrescore2feature now has 18 parameters, making it difficult to maintain and understand. Consider grouping related parameters into configuration objects (e.g., TransferLearningConfig, MS2ToleranceConfig) or using a more structured approach like dataclasses to improve code organization and maintainability.
quantmsrescore/annotator.py
Outdated
| else: | ||
| self._ms2pip = False | ||
| if "alphapeptdeep" in feature_annotators: | ||
| self._alphapeptdeep = True | ||
| elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm": | ||
| self._alphapeptdeep = True | ||
| logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!") |
There was a problem hiding this comment.
The logic here silently overrides the user's feature generator choice when they specify "ms2pip" with ppm tolerance. If a user explicitly requests "ms2pip" but provides ppm tolerance, the code enables AlphaPeptDeep instead with only a warning. This could be confusing. Consider either: 1) raising an error to force the user to fix their configuration, or 2) automatically adjusting the feature_generators string to reflect what's actually being used, so downstream logic and logging are consistent.
| else: | |
| self._ms2pip = False | |
| if "alphapeptdeep" in feature_annotators: | |
| self._alphapeptdeep = True | |
| elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm": | |
| self._alphapeptdeep = True | |
| logger.warning("MS2PIP only supports Da units. AlphaPeptDeep is enabled when setting ppm tolerance!") | |
| elif "ms2pip" in feature_annotators and ms2_tolerance_unit == "ppm": | |
| raise ValueError("MS2PIP only supports Da units. Please remove 'ms2pip' from feature_generators or set ms2_tolerance_unit to 'Da'.") | |
| else: | |
| self._ms2pip = False | |
| if "alphapeptdeep" in feature_annotators: | |
| self._alphapeptdeep = True |
quantmsrescore/annotator.py
Outdated
| logger.info("Running MS2PIP model") | ||
| ms2pip_best_model, ms2pip_best_corr = ms2pip_generator._find_best_ms2pip_model(calibration_set) | ||
| else: | ||
| logger.info("MS2PIP model don't support ppm tolerance unit. Only consider AlphaPeptDeep model") |
There was a problem hiding this comment.
Incorrect grammar: "don't" should be "doesn't" (MS2PIP model is singular)
| logger.info("MS2PIP model don't support ppm tolerance unit. Only consider AlphaPeptDeep model") | |
| logger.info("MS2PIP model doesn't support ppm tolerance unit. Only consider AlphaPeptDeep model") |
quantmsrescore/annotator.py
Outdated
| try: | ||
| ms2pip_generator = self._create_ms2pip_annotator() | ||
| original_model = ms2pip_generator.model | ||
| model_to_use = original_model |
There was a problem hiding this comment.
quantmsrescore/annotator.py
Outdated
| model_to_use = original_model | ||
| except Exception as e: | ||
| logger.error(f"Failed to initialize MS2PIP: {e}") | ||
| raise | ||
| else: | ||
| original_model = alphapeptdeep_generator.model | ||
| model_to_use = original_model |
There was a problem hiding this comment.
This assignment to 'model_to_use' is unnecessary as it is redefined before this value is used.
This assignment to 'model_to_use' is unnecessary as it is redefined before this value is used.
| model_to_use = original_model | |
| except Exception as e: | |
| logger.error(f"Failed to initialize MS2PIP: {e}") | |
| raise | |
| else: | |
| original_model = alphapeptdeep_generator.model | |
| model_to_use = original_model | |
| except Exception as e: | |
| logger.error(f"Failed to initialize MS2PIP: {e}") | |
| raise | |
| else: | |
| original_model = alphapeptdeep_generator.model |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
quantmsrescore/alphapeptdeep.py (2)
184-188: Fix AttributeError: useself.consider_modlossinstead ofself._consider_modloss.
AlphaPeptDeepFeatureGenerator.__init__setsself.consider_modloss(line 82), but line 185 referencesself._consider_modloss. This will raiseAttributeErrorwhen this code executes.Apply this diff:
try: model_mgr = MS2ModelManager( - mask_modloss=not self._consider_modloss, + mask_modloss=not self.consider_modloss, device="cpu", model_dir=self.model_dir )
815-887: Guard against empty calibration set and unmatched spectra to prevent crashes.When
calibration_setis empty (e.g.,len(enumerated_psm_list_copy) * calibration_set_size < 1) or when no spectra match,match_intensity_dfremains an empty list, causingpd.concatat line 869 to raiseValueError. This crashes small or edge-case datasets instead of gracefully skipping fine-tuning.Add guards before concatenation and fine-tuning:
# Get a calibration set, the % of psms to be used for calibration is defined by calibration_set_size calibration_set = enumerated_psm_list_copy[ : int(len(enumerated_psm_list_copy) * calibration_set_size) ] + + if not calibration_set: + logger.info( + "No PSMs available for MS2 transfer learning calibration; " + "skipping fine-tuning and using pretrained model." + ) + else: precursor_df = psms_df[ psms_df["provenance_data"].isin([next(iter(psm.provenance_data.keys())) for psm in calibration_set])] ... - match_intensity_df = pd.concat(match_intensity_df, ignore_index=True) - - psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio)) - psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2 - - precursor_df.drop(columns=["frag_start_idx", "frag_stop_idx"], inplace=True) - precursor_df.rename(columns={ - "match_start_idx": "frag_start_idx", - "match_stop_idx": "frag_stop_idx" - }, inplace=True) - precursor_df["frag_start_idx"] = precursor_df["frag_start_idx"].astype("int64") - precursor_df["frag_stop_idx"] = precursor_df["frag_stop_idx"].astype("int64") - - model_mgr.ms2_fine_tuning(precursor_df, - match_intensity_df, - psm_num_to_train_ms2=psm_num_to_train_ms2, - psm_num_to_test_ms2=psm_num_to_test_ms2, - train_verbose=True, - epoch_to_train_ms2=epoch_to_train_ms2) + if not match_intensity_df: + logger.info( + "No matched fragment intensities found for calibration set; " + "skipping MS2 fine-tuning and using pretrained model." + ) + else: + match_intensity_df = pd.concat(match_intensity_df, ignore_index=True) + + psm_num_to_train_ms2 = int(len(calibration_set) * (1 - transfer_learning_test_ratio)) + psm_num_to_test_ms2 = len(calibration_set) - psm_num_to_train_ms2 + + precursor_df.drop(columns=["frag_start_idx", "frag_stop_idx"], inplace=True) + precursor_df.rename(columns={ + "match_start_idx": "frag_start_idx", + "match_stop_idx": "frag_stop_idx" + }, inplace=True) + precursor_df["frag_start_idx"] = precursor_df["frag_start_idx"].astype("int64") + precursor_df["frag_stop_idx"] = precursor_df["frag_stop_idx"].astype("int64") + + model_mgr.ms2_fine_tuning(precursor_df, + match_intensity_df, + psm_num_to_train_ms2=psm_num_to_train_ms2, + psm_num_to_test_ms2=psm_num_to_test_ms2, + train_verbose=True, + epoch_to_train_ms2=epoch_to_train_ms2)
🧹 Nitpick comments (2)
quantmsrescore/alphapeptdeep.py (2)
856-856: Replace unused loop variable with underscore.The loop variable
psm_idxis unpacked but never used in the loop body. Replace it with_to signal it's intentionally unused.# Process each PSM for this spectrum - for psm_idx, psm in psms_by_specid[spectrum_id]: + for _, psm in psms_by_specid[spectrum_id]:
842-849: Improve exception handling by chaining the original error.The exception raised in the
exceptblock doesn't chain the original error, losing context about what caused the pattern matching failure. Useraise ... from errto preserve the error chain.# Match spectrum ID with provided regex match = spectrum_id_regex.search(spectrum.identifier) try: spectrum_id = match[1] except (TypeError, IndexError): raise exceptions.TitlePatternError( f"Spectrum title pattern `{spectrum_id_pattern}` could not be matched to " f"spectrum ID `{spectrum.identifier}`. " " Are you sure that the regex contains a capturing group?" - ) + ) from NoneNote: Using
from Noneexplicitly suppresses the exception context if you prefer, or usefrom eif you catch the exception asexcept (TypeError, IndexError) as e:to preserve the chain.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/alphapeptdeep.py(18 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/alphapeptdeep.py (1)
quantmsrescore/ms2_model_manager.py (3)
MS2ModelManager(12-173)save_ms2_model(172-173)ms2_fine_tuning(42-59)
🪛 Ruff (0.14.7)
quantmsrescore/alphapeptdeep.py
845-849: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
845-849: Avoid specifying long messages outside the exception class
(TRY003)
856-856: Loop control variable psm_idx not used within loop body
(B007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build
- GitHub Check: build (3.11)
- GitHub Check: Codacy Static Code Analysis
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
quantmsrescore/annotator.py (1)
470-478: Tidy up_find_and_apply_ms2_modeldocstring and logging message
- The docstring still documents a
psm_list : PSMListparameter even though the method takes no arguments. That section should be removed or updated.- Minor grammar nit in the log message:
logger.info("MS2PIP model don't support ppm tolerance unit...")should be"doesn't support".These tweaks keep the public contract clear and the logs polished.
Also applies to: 519-536
🧹 Nitpick comments (4)
quantmsrescore/annotator.py (4)
27-36: init docstring is out of sync with the actual signature and defaultsThe parameter docs here have drifted from the constructor:
ms2_model_pathis documented as default"./"but the code default is"models".ms2_tolerance_unitandvalid_correlations_sizeare not documented at all.transfer_learningis documented as “bool, required”, but it has a default (False) and is therefore optional.Please update the docstring so that all parameters (including
ms2_tolerance_unitandvalid_correlations_size) are listed with correct types, defaults, and descriptions, and sotransfer_learningis described as optional with its default.Also applies to: 39-85
104-117: Validate and/or normalizems2_tolerance_unitto avoid silent misconfigurationRight now only the exact strings
"Da"and"ppm"are handled; any other value (e.g."da"or"ppm "), especially whenfeature_generatorscontains only"ms2pip", will silently disable MS2PIP and may not enable AlphaPeptDeep either, resulting in no MS2 generator being run.Consider:
- Normalizing the unit (e.g.
ms2_tolerance_unit = ms2_tolerance_unit.strip()and enforcing a canonical case), and- Raising a
ValueError(or at least logging a warning) if the normalized value is not one of{"Da", "ppm"}.This makes configuration errors fail fast instead of silently dropping MS2 features.
Also applies to: 129-145
320-359: Uselogger.exception(and optionally narrow the exception type) for AlphaPeptDeep failuresThe broad
except Exception as e:here logs onlystr(e)and then swallows the error with an early return. That makes debugging real programming errors (TypeError, KeyError, etc.) inside the AlphaPeptDeep path harder.At minimum, switch to
logger.exception("Failed to apply AlphaPeptDeep annotation")so the traceback is preserved, and consider narrowing the caught exception type if you mainly expect model/IO errors. The same reasoning applies to the similar catches in_find_and_apply_ms2_model.
361-375: Update_create_alphapeptdeep_annotatordocstring to reflect new parameters and return typeThe docstring still only documents the
modelparameter and says it returnsAlphaPeptDeep, but the signature now includestoleranceandtolerance_unit, and the function actually returns anAlphaPeptDeepAnnotator.Please extend the docstring to cover
toleranceandtolerance_unitand adjust the return section to nameAlphaPeptDeepAnnotator.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/annotator.py(11 hunks)
🧰 Additional context used
🪛 Ruff (0.14.7)
quantmsrescore/annotator.py
355-355: Do not catch blind exception: Exception
(BLE001)
356-356: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
495-495: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.11)
- GitHub Check: build
🔇 Additional comments (1)
quantmsrescore/annotator.py (1)
427-468: Good extraction of shared AlphaPeptDeep validation/fallback logicFactoring the “best vs original model” validation, application, and
ms2_generatorassignment into_apply_alphapeptdeep_modelremoves duplication and makes_find_and_apply_ms2_modelmuch easier to follow. The earlyFalsereturn on double-validation failure is also a clear signal for the caller.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
quantmsrescore/ms2rescore.py (1)
133-157: Unusedctxand very long parameter list
ctxis not used, which triggers Ruff’sARG001. If you don’t need the context, consider dropping@click.pass_contextand the parameter, or simply renaming it to_ctxto signal intentional non‑use and silence the linter:-@click.pass_context -def msrescore2feature( - ctx, +@click.pass_context +def msrescore2feature( + _ctx, idxml: str, mzml, output: str, log_level, processes, feature_generators, force_model, find_best_model, only_features, ms2_model, ms2_model_dir, ms2_tolerance, ms2_tolerance_unit, calibration_set_size, valid_correlations_size, skip_deeplc_retrain, spectrum_id_pattern: str, psm_id_pattern: str, consider_modloss, transfer_learning, transfer_learning_test_ratio, save_retrain_model, epoch_to_train_ms2 ):
- The function now takes a large number of parameters, which hurts readability and maintainability. Longer‑term, consider grouping related settings (MS2 config, transfer‑learning config, paths) into small config objects or dataclasses and passing those instead.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
quantmsrescore/ms2rescore.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
quantmsrescore/ms2rescore.py (2)
quantmsrescore/annotator.py (3)
build_idxml_data(147-193)annotate(195-230)write_idxml_file(232-256)quantmsrescore/openms.py (1)
write_idxml_file(251-274)
🪛 Ruff (0.14.7)
quantmsrescore/ms2rescore.py
134-134: Unused function argument: ctx
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (5)
quantmsrescore/ms2rescore.py (5)
76-80: Tolerance help text aligns with default valueThe updated help text for
--ms2_tolerancecorrectly reflects the0.05default; no functional issues here.
81-86: Newms2_tolerance_unitoption is well-scopedAdding a dedicated
ms2_tolerance_unitflag with a constrained choice (Da/ppm) and a sensible default is consistent with the rest of the CLI and keeps unit handling explicit.
119-131: Transfer‑learning CLI flags are consistent with existing optionsThe new flags for enabling transfer learning, test ratio, model saving, and epoch count follow the existing style (boolean flags and numeric ratios) and give users clear control over MS2 fine‑tuning behavior.
219-240: Propagation of new options intoFeatureAnnotatorlooks correctThe new arguments (
ms2_tolerance_unit,consider_modloss, and all transfer‑learning controls) are passed through as keyword arguments, which keeps the call site clear and robust against parameter reordering inFeatureAnnotator.__init__. Wiring appears complete and consistent with the CLI options.
245-245: Conditional output write is reasonableConditionally writing the idXML only when
outputis provided is a sensible safeguard and matches the optional nature of the--outputflag. No issues here.
| If modloss ions are considered in the ms2 model. `modloss` | ||
| ions are mostly useful for phospho MS2 prediciton model. | ||
| ions are mostly useful for phospho MS2 prediction model. | ||
| Defaults to True. | ||
| transfer_learning: bool, optional | ||
| Enabling transfer learning for AlphaPeptDeep MS2 prediction. | ||
| Defaults to False. | ||
| transfer_learning_test_ratio: float, optional | ||
| The ratio of test data for MS2 transfer learning. | ||
| Defaults to 0.3. | ||
| save_retrain_model: bool, optional | ||
| Save retrained MS2 model. | ||
| Defaults to False. | ||
| epoch_to_train_ms2: int, optional | ||
| Number of epochs to train AlphaPeptDeep MS2 model. Defaults to 20. |
There was a problem hiding this comment.
Docstring: default mismatch and missing ms2_tolerance_unit entry
Two small inconsistencies in the docstring:
-
consider_modloss- Docstring says “Defaults to True.”
- The CLI flag is a simple
is_flag=Trueoption, which defaults toFalse.
Please align the documented default with the actual behavior (or change the option default ifTrueis what you intend).
-
ms2_tolerance_unit- There is a CLI option, but it’s not documented in the parameter list.
- Suggest adding something like:
- ms2_tolerance : float
- The tolerance for MS²PIP annotation.
+ ms2_tolerance : float
+ The tolerance for MS²PIP annotation.
+ ms2_tolerance_unit : str, optional
+ Unit for the fragment mass tolerance, e.g. "Da" or "ppm".This keeps the CLI behavior and Python docstring in sync for users and for generated docs.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In quantmsrescore/ms2rescore.py around lines 203 to 216, the docstring defaults
and parameter list are out of sync: update the docstring to reflect the actual
default for consider_modloss (CLI uses is_flag=True so default is False unless
you change the option) by either changing the written “Defaults to True.” to
“Defaults to False.” or change the CLI option default to True to match the
docstring; and add a documented parameter entry for ms2_tolerance_unit
describing its type, allowed values (e.g., 'ppm' or 'Da'), default value, and
brief purpose so the CLI option is reflected in the docstring and generated
docs.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
quantmsrescore/ms2_model_manager.py (2)
12-37: Missingsuper().__init__()call in subclass constructor.
MS2ModelManagerextendsModelManagerbut doesn't call the parent constructor. This may causeAttributeErrorwhen accessing inherited attributes likebatch_size_to_train_ms2,warmup_epoch_to_train_ms2, andlr_to_train_ms2used intrain_ms2_model(lines 154-157).Apply this diff to fix:
def __init__(self, mask_modloss: bool = False, device: str = "gpu", - model_dir: str = None, + model_dir: str | None = None, ): + super().__init__() self._train_psm_logging = True
129-136: Potential index mismatch betweentest_psm_dfandtr_inten_df.In this else branch,
test_psm_dfis copied frompsm_df, buttr_inten_dfis built from the fullmatched_intensity_dfwithout filtering to matchtest_psm_df.index. When passed toself.ms2_model.test()at line 144, the indices may not align, leading to incorrect data pairing.Apply this diff to filter
matched_intensity_dfto the test indices:else: test_psm_df = psm_df.copy() + matched_intensity_subset = matched_intensity_df.loc[test_psm_df.index] tr_inten_df = pd.DataFrame() for frag_type in self.ms2_model.charged_frag_types: - if frag_type in matched_intensity_df.columns: - tr_inten_df[frag_type] = matched_intensity_df[frag_type] + if frag_type in matched_intensity_subset.columns: + tr_inten_df[frag_type] = matched_intensity_subset[frag_type] else: tr_inten_df[frag_type] = 0.0quantmsrescore/ms2rescore.py (1)
204-207: Docstring default mismatch forconsider_modloss.The docstring states "Defaults to True" but the CLI option uses
is_flag=True, which defaults toFalse. Update the docstring to reflect the actual behavior.consider_modloss: bool, optional If modloss ions are considered in the ms2 model. `modloss` ions are mostly useful for phospho MS2 prediction model. - Defaults to True. + Defaults to False.
🧹 Nitpick comments (4)
quantmsrescore/ms2_model_manager.py (1)
172-173: Consider making the save path configurable.The output filename is hardcoded to
"retrained_ms2.pth". For flexibility, consider accepting an optional path parameter or usingself.model_strto differentiate saved models.- def save_ms2_model(self): - self.ms2_model.save("retrained_ms2.pth") + def save_ms2_model(self, output_path: str = "retrained_ms2.pth"): + self.ms2_model.save(output_path)quantmsrescore/ms2rescore.py (1)
133-134: Unusedctxparameter.The
ctxparameter from@click.pass_contextis never used in the function body. Either remove@click.pass_contextdecorator or document why the context is needed for future use.If the context is not needed, remove the decorator and parameter:
-@click.pass_context def msrescore2feature( - ctx, idxml: str,quantmsrescore/annotator.py (2)
355-357: Consider usinglogger.exceptionfor better debugging.Using
logger.exceptioninstead oflogger.errorwill automatically include the stack trace, making it easier to diagnose failures.except Exception as e: - logger.error(f"Failed to apply AlphaPeptDeep annotation: {e}") + logger.exception(f"Failed to apply AlphaPeptDeep annotation: {e}") return # Indicate failure through early return
493-495: Consider usinglogger.exceptionfor better debugging.Same as the previous comment—using
logger.exceptionprovides stack traces for easier debugging.except Exception as e: - logger.error(f"Failed to initialize MS2PIP: {e}") + logger.exception(f"Failed to initialize MS2PIP: {e}") raise
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
quantmsrescore/annotator.py(11 hunks)quantmsrescore/ms2_model_manager.py(1 hunks)quantmsrescore/ms2rescore.py(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
quantmsrescore/annotator.py (3)
quantmsrescore/alphapeptdeep.py (5)
AlphaPeptDeepAnnotator(390-719)validate_features(437-500)add_features(160-211)add_features(502-556)_find_best_ms2_model(629-681)quantmsrescore/openms.py (1)
validate_features(390-415)quantmsrescore/ms2pip.py (3)
validate_features(245-296)add_features(298-340)_find_best_ms2pip_model(413-460)
quantmsrescore/ms2rescore.py (2)
quantmsrescore/annotator.py (3)
build_idxml_data(147-193)annotate(195-230)write_idxml_file(232-256)quantmsrescore/openms.py (1)
write_idxml_file(251-274)
🪛 Ruff (0.14.7)
quantmsrescore/ms2_model_manager.py
16-16: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
quantmsrescore/annotator.py
110-111: Avoid specifying long messages outside the exception class
(TRY003)
355-355: Do not catch blind exception: Exception
(BLE001)
356-356: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
494-494: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
quantmsrescore/ms2rescore.py
134-134: Unused function argument: ctx
(ARG001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Codacy Static Code Analysis
- GitHub Check: build (3.11)
- GitHub Check: build
🔇 Additional comments (6)
quantmsrescore/ms2rescore.py (2)
81-86: LGTM!The new
ms2_tolerance_unitoption is well-defined with aChoicevalidator ensuring only valid units (Daorppm) are accepted.
119-131: LGTM!The transfer learning CLI options are appropriately typed and provide sensible defaults. The flags follow the existing naming conventions in the file.
quantmsrescore/annotator.py (4)
107-117: LGTM!The tolerance unit validation correctly prevents MS2PIP from being used with ppm units, which is not supported. The clear error message helps users understand how to fix the configuration.
427-468: LGTM!The new
_validate_and_apply_alphapeptdeep_modelmethod consolidates validation and fallback logic cleanly, returning a boolean to indicate success/failure. This reduces code duplication in the calling code.
488-497: LGTM!The MS2PIP annotator is now correctly initialized only when
ms2_tolerance_unit == "Da", preventing initialization errors when ppm tolerance is used.
376-395: LGTM!The
_create_alphapeptdeep_annotatormethod correctly propagates the newtolerance_unitand transfer learning parameters to theAlphaPeptDeepAnnotatorconstructor.
| transfer_learning : bool, required | ||
| Whether to use MS2 transfer learning. Set to True to enable transfer learning for MS2 model. |
There was a problem hiding this comment.
Docstring incorrectly marks transfer_learning as required.
The parameter has a default value of False (line 34), so it should be documented as optional, not required.
- transfer_learning : bool, required
- Whether to use MS2 transfer learning. Set to True to enable transfer learning for MS2 model.
+ transfer_learning : bool, optional
+ Whether to enable transfer learning for AlphaPeptDeep MS2 model.
+ Defaults to False.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| transfer_learning : bool, required | |
| Whether to use MS2 transfer learning. Set to True to enable transfer learning for MS2 model. | |
| transfer_learning : bool, optional | |
| Whether to enable transfer learning for AlphaPeptDeep MS2 model. | |
| Defaults to False. |
🤖 Prompt for AI Agents
In quantmsrescore/annotator.py around lines 75 to 76, the docstring wrongly
labels the parameter transfer_learning as "required" even though it has a
default value of False (line 34); update the docstring to mark transfer_learning
as "optional" and adjust the description to reflect that it defaults to False
and enables MS2 transfer learning when set to True.
PR Type
Enhancement
Description
Add MS2 transfer learning support for AlphaPeptDeep model
Implement PPM tolerance unit support alongside Da units
Create MS2ModelManager class for fine-tuning capabilities
Add transfer learning configuration parameters to CLI and annotator
Refactor custom_correlate function to support model manager objects
Diagram Walkthrough
File Walkthrough
__init__.py
Version bump to 0.0.13quantmsrescore/init.py
pyproject.toml
Version bump to 0.0.13pyproject.toml
alphapeptdeep.py
Add transfer learning and PPM tolerance supportquantmsrescore/alphapeptdeep.py
ms2_tolerance_unitparameter to support PPM and Da unitsMS2ModelManagerclass for transfer learningcustom_correlatefunction to acceptModelManagerobjectsinstead of model strings
validate_featuresandadd_featuresmethodstransfer_learning,transfer_learning_test_ratio,epoch_to_train_ms2ms2_fine_tunefunction replacingmake_predictionwithfine-tuning capabilities
_get_targets_df_for_psmfunction for DataFrame-based targetextraction
_get_targets_for_psmannotator.py
Add tolerance unit and transfer learning parametersquantmsrescore/annotator.py
ms2_tolerance_unitparameter to FeatureAnnotator initialization_create_alphapeptdeep_annotatorto pass tolerance unit andtransfer learning parameters
_find_and_apply_ms2_modelto handle PPM tolerance by skippingMS2PIP
ms2_model_manager.py
New MS2ModelManager for transfer learningquantmsrescore/ms2_model_manager.py
ms2_fine_tuningmethod for transfer learning configurationtrain_ms2_modelmethod for model fine-tuning with PSMsampling
save_ms2_modelmethod to persist retrained model weightsms2rescore.py
Add CLI options for transfer learning and tolerance unitsquantmsrescore/ms2rescore.py
--ms2_tolerance_unitCLI option supporting Da and ppm choices--transfer_learning,--transfer_learning_test_ratio,--save_retrain_model,--epoch_to_train_ms2msrescore2featurefunction signature to accept new parametersSummary by CodeRabbit
New Features
Chores
CI
✏️ Tip: You can customize this high-level summary in your review settings.