-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Issue Description
Feature Request: Add new QC metrics inspired by FragPipe generate_reports_pdf.py
Summary
Based on analysis of FragPipe's generate_reports_pdf.py QC script (suggested by the FragPipe development team), we have identified several valuable QC metrics that could be added to pmultiqc. This issue documents which metrics can be implemented for each supported platform (FragPipe, MaxQuant, DIA-NN, quantms).
Reference: https://github.com/Nesvilab/FragPipe/blob/develop/tools/generate_reports_pdf.py
New QC Metrics Overview
Metrics Already in pmultiqc
- ✅ Charge Distribution
- ✅ Missed Cleavages
- ✅ Delta Mass (Da/ppm)
- ✅ PSM/Peptide/Protein Counts
- ✅ IDs over RT
- ✅ Peptide Intensity
- ✅ Search Engine Scores
- ✅ Modifications
- ✅ Contaminants
- ✅ Heatmap Summary
Metrics NOT Yet in pmultiqc (Proposed)
| Metric | Description |
|---|---|
| Peptide Length Distribution | Histogram showing peptide length distribution |
| M/Z Distribution | Histogram of precursor m/z values |
| MS1/MS2 Mass Error Before/After Calibration | Grouped comparison of calibration effect |
| M/Z vs. Delta Mass Scatter | 2D scatter revealing m/z-dependent mass accuracy |
| RT vs. M/Z Scatter | 2D scatter showing chromatographic separation |
| Percolator Feature Weights | ML feature importance visualization |
| Number of Enzymatic Termini | Distribution of fully/semi-tryptic peptides |
| RT Calibration Quality | RT prediction accuracy plots |
Implementation Feasibility by Platform
Tier 1: Implement for ALL Platforms (High Priority)
1. Peptide Length Distribution
| Platform | Data Source | Column/Calculation | Status |
|---|---|---|---|
| FragPipe | psm.tsv |
Peptide Length (direct) |
✅ Ready |
| MaxQuant | evidence.txt |
len(Sequence) |
✅ Ready |
| DIA-NN | report.tsv |
len(Stripped.Sequence) |
✅ Ready |
| quantms | mzTab |
len(sequence) |
✅ Ready |
Value: Reveals digestion efficiency, identifies degradation or incomplete digestion
2. M/Z Distribution Histogram
| Platform | Data Source | Column | Status |
|---|---|---|---|
| FragPipe | psm.tsv |
Observed M/Z or Calculated M/Z |
✅ Ready |
| MaxQuant | evidence.txt |
m/z |
✅ Ready |
| DIA-NN | report.tsv |
Calculate from mass/charge | |
| quantms | mzTab |
exp_mass_to_charge |
✅ Ready |
Value: Shows MS sampling range, identifies instrument biases
Tier 2: Implement for MOST Platforms (Medium Priority)
3. M/Z vs. Delta Mass Scatter Plot
| Platform | X-axis | Y-axis | Status |
|---|---|---|---|
| FragPipe | Calculated M/Z |
Delta Mass |
✅ Ready |
| MaxQuant | m/z |
Mass Error [ppm] |
✅ Ready |
| DIA-NN | Calculate M/Z | Ms1.Apex.Mz.Delta |
|
| quantms | calc_mass_to_charge |
ppm error | ✅ Ready |
Value: Reveals m/z-dependent mass accuracy trends
4. RT vs. M/Z Scatter Plot
| Platform | X-axis | Y-axis | Status |
|---|---|---|---|
| FragPipe | Retention |
Calculated M/Z |
✅ Ready |
| MaxQuant | Retention time |
m/z |
✅ Ready |
| DIA-NN | RT |
Calculate M/Z | |
| quantms | retention_time |
exp_mass_to_charge |
✅ Ready |
Value: Shows chromatographic separation efficiency
Tier 3: Platform-Specific (Medium Priority)
5. Mass Error Before/After Calibration
| Platform | Before Column | After Column | Status |
|---|---|---|---|
| FragPipe | Observed M/Z - Calculated M/Z |
Calibrated Observed M/Z - Calculated M/Z |
✅ Ready |
| MaxQuant | Uncalibrated Mass Error [ppm] |
Mass Error [ppm] |
✅ Ready |
| DIA-NN | ❌ Not available | - | Not feasible |
| quantms | ❌ Not available | - | Not feasible |
Value: Demonstrates calibration effectiveness, critical for QC
6. Number of Enzymatic Termini Distribution
| Platform | Data Source | Status |
|---|---|---|
| FragPipe | psm.tsv → Number of Enzymatic Termini |
✅ Ready |
| MaxQuant | Not directly available | ❌ Not feasible |
| DIA-NN | Not available | ❌ Not feasible |
| quantms | Not available | ❌ Not feasible |
Value: Shows enzyme specificity (fully vs semi-tryptic)
Tier 4: Advanced/Future (Lower Priority)
7. Percolator Feature Weights
| Platform | Status |
|---|---|
| FragPipe | |
| MaxQuant | ❌ Andromeda doesn't use Percolator |
| DIA-NN | ❌ Neural network not interpretable |
| quantms |
8. RT Calibration Quality (extend existing)
| Platform | Status |
|---|---|
| FragPipe | |
| MaxQuant | ❌ No RT prediction |
| DIA-NN | ✅ Already partial (RT vs Predicted.RT) - extend |
| quantms | ✅ In DIA mode - extend |
Summary: Implementation Matrix
| Metric | FragPipe | MaxQuant | DIA-NN | quantms | Priority |
|---|---|---|---|---|---|
| Peptide Length Distribution | ✅ | ✅ | ✅ | ✅ | 🔴 HIGH |
| M/Z Distribution | ✅ | ✅ | ✅ | 🔴 HIGH | |
| M/Z vs Delta Mass Scatter | ✅ | ✅ | ✅ | 🟡 MEDIUM | |
| RT vs M/Z Scatter | ✅ | ✅ | ✅ | 🟡 MEDIUM | |
| Mass Error Before/After Cal | ✅ | ✅ | ❌ | ❌ | 🟡 MEDIUM |
| Enzymatic Termini | ✅ | ❌ | ❌ | ❌ | 🟢 LOW |
| RT Calibration Quality | ❌ | ✅ | ✅ | 🟡 MEDIUM | |
| Percolator Weights | ❌ | ❌ | 🟢 LOW |
Legend:
- ✅ = Fully feasible with existing data
⚠️ = Feasible but requires additional parsing/calculation- ❌ = Not feasible (data not available)
Recommended Implementation Order
Phase 1: Quick Wins
- Peptide Length Distribution (all platforms)
- M/Z Distribution Histogram (all platforms)
Phase 2: Scatter Plots
- M/Z vs Delta Mass Scatter (FragPipe, MaxQuant, quantms)
- RT vs M/Z Scatter (FragPipe, MaxQuant, quantms)
Phase 3: Calibration Analysis
- Mass Error Before/After Calibration (FragPipe, MaxQuant)
Phase 4: Platform-Specific
- Enzymatic Termini Distribution (FragPipe only)
- Extended RT Calibration plots (DIA-NN, quantms DIA mode)
Technical Notes
Proposed Plot Locations
- Peptide Length Distribution →
ms2oridentificationsection - M/Z Distribution →
ms2section - Scatter plots →
mass_errorsection (M/Z vs Delta) orrt_qcsection (RT vs M/Z) - Calibration comparison →
mass_errorsection
Code Architecture
New common plot functions should be added to pmultiqc/modules/common/plots/:
draw_peptide_length_distribution()draw_mz_distribution()draw_mz_vs_delta_mass_scatter()draw_rt_vs_mz_scatter()draw_calibration_comparison()
Data extraction logic should be added to each tool's utility module.
Data Column Reference
FragPipe (psm.tsv)
Peptide Length- directObserved M/Z,Calculated M/Z,Calibrated Observed M/ZDelta MassRetentionChargeNumber of Enzymatic TerminiHyperscore,Expectation
MaxQuant (evidence.txt)
Sequence(calculate length)m/zMass Error [Da],Mass Error [ppm],Uncalibrated Mass Error [ppm]Retention time,Retention lengthChargeScore
DIA-NN (report.tsv)
Stripped.Sequence(calculate length)Precursor.Charge, mass (calculate M/Z)Ms1.Apex.Mz.DeltaRT,Predicted.RT,iRTPrecursor.Quantity
quantms (mzTab)
sequence(calculate length)exp_mass_to_charge,calc_mass_to_charge- mass error (calculated)
retention_timechargesearch_engine_score[1]
Related
- FragPipe QC script: https://github.com/Nesvilab/FragPipe/blob/develop/tools/generate_reports_pdf.py
Issue Type
Suggestion
Context
No response
Additional Information
No response
Code of Conduct
- I agree to follow this project's Code of Conduct