Skip to content

[GENERAL] New metrics based on FragPipe feedback #546

@ypriverol

Description

@ypriverol

Issue Description

Feature Request: Add new QC metrics inspired by FragPipe generate_reports_pdf.py

Summary

Based on analysis of FragPipe's generate_reports_pdf.py QC script (suggested by the FragPipe development team), we have identified several valuable QC metrics that could be added to pmultiqc. This issue documents which metrics can be implemented for each supported platform (FragPipe, MaxQuant, DIA-NN, quantms).

Reference: https://github.com/Nesvilab/FragPipe/blob/develop/tools/generate_reports_pdf.py


New QC Metrics Overview

Metrics Already in pmultiqc

  • ✅ Charge Distribution
  • ✅ Missed Cleavages
  • ✅ Delta Mass (Da/ppm)
  • ✅ PSM/Peptide/Protein Counts
  • ✅ IDs over RT
  • ✅ Peptide Intensity
  • ✅ Search Engine Scores
  • ✅ Modifications
  • ✅ Contaminants
  • ✅ Heatmap Summary

Metrics NOT Yet in pmultiqc (Proposed)

Metric Description
Peptide Length Distribution Histogram showing peptide length distribution
M/Z Distribution Histogram of precursor m/z values
MS1/MS2 Mass Error Before/After Calibration Grouped comparison of calibration effect
M/Z vs. Delta Mass Scatter 2D scatter revealing m/z-dependent mass accuracy
RT vs. M/Z Scatter 2D scatter showing chromatographic separation
Percolator Feature Weights ML feature importance visualization
Number of Enzymatic Termini Distribution of fully/semi-tryptic peptides
RT Calibration Quality RT prediction accuracy plots

Implementation Feasibility by Platform

Tier 1: Implement for ALL Platforms (High Priority)

1. Peptide Length Distribution

Platform Data Source Column/Calculation Status
FragPipe psm.tsv Peptide Length (direct) ✅ Ready
MaxQuant evidence.txt len(Sequence) ✅ Ready
DIA-NN report.tsv len(Stripped.Sequence) ✅ Ready
quantms mzTab len(sequence) ✅ Ready

Value: Reveals digestion efficiency, identifies degradation or incomplete digestion

2. M/Z Distribution Histogram

Platform Data Source Column Status
FragPipe psm.tsv Observed M/Z or Calculated M/Z ✅ Ready
MaxQuant evidence.txt m/z ✅ Ready
DIA-NN report.tsv Calculate from mass/charge ⚠️ Needs calculation
quantms mzTab exp_mass_to_charge ✅ Ready

Value: Shows MS sampling range, identifies instrument biases


Tier 2: Implement for MOST Platforms (Medium Priority)

3. M/Z vs. Delta Mass Scatter Plot

Platform X-axis Y-axis Status
FragPipe Calculated M/Z Delta Mass ✅ Ready
MaxQuant m/z Mass Error [ppm] ✅ Ready
DIA-NN Calculate M/Z Ms1.Apex.Mz.Delta ⚠️ Needs M/Z calc
quantms calc_mass_to_charge ppm error ✅ Ready

Value: Reveals m/z-dependent mass accuracy trends

4. RT vs. M/Z Scatter Plot

Platform X-axis Y-axis Status
FragPipe Retention Calculated M/Z ✅ Ready
MaxQuant Retention time m/z ✅ Ready
DIA-NN RT Calculate M/Z ⚠️ Needs M/Z calc
quantms retention_time exp_mass_to_charge ✅ Ready

Value: Shows chromatographic separation efficiency


Tier 3: Platform-Specific (Medium Priority)

5. Mass Error Before/After Calibration

Platform Before Column After Column Status
FragPipe Observed M/Z - Calculated M/Z Calibrated Observed M/Z - Calculated M/Z ✅ Ready
MaxQuant Uncalibrated Mass Error [ppm] Mass Error [ppm] ✅ Ready
DIA-NN ❌ Not available - Not feasible
quantms ❌ Not available - Not feasible

Value: Demonstrates calibration effectiveness, critical for QC

6. Number of Enzymatic Termini Distribution

Platform Data Source Status
FragPipe psm.tsvNumber of Enzymatic Termini ✅ Ready
MaxQuant Not directly available ❌ Not feasible
DIA-NN Not available ❌ Not feasible
quantms Not available ❌ Not feasible

Value: Shows enzyme specificity (fully vs semi-tryptic)


Tier 4: Advanced/Future (Lower Priority)

7. Percolator Feature Weights

Platform Status
FragPipe ⚠️ Requires log file parsing
MaxQuant ❌ Andromeda doesn't use Percolator
DIA-NN ❌ Neural network not interpretable
quantms ⚠️ Requires Percolator log parsing

8. RT Calibration Quality (extend existing)

Platform Status
FragPipe ⚠️ Requires MSBooster output files
MaxQuant ❌ No RT prediction
DIA-NN ✅ Already partial (RT vs Predicted.RT) - extend
quantms ✅ In DIA mode - extend

Summary: Implementation Matrix

Metric FragPipe MaxQuant DIA-NN quantms Priority
Peptide Length Distribution 🔴 HIGH
M/Z Distribution ⚠️ 🔴 HIGH
M/Z vs Delta Mass Scatter ⚠️ 🟡 MEDIUM
RT vs M/Z Scatter ⚠️ 🟡 MEDIUM
Mass Error Before/After Cal 🟡 MEDIUM
Enzymatic Termini 🟢 LOW
RT Calibration Quality ⚠️ 🟡 MEDIUM
Percolator Weights ⚠️ ⚠️ 🟢 LOW

Legend:

  • ✅ = Fully feasible with existing data
  • ⚠️ = Feasible but requires additional parsing/calculation
  • ❌ = Not feasible (data not available)

Recommended Implementation Order

Phase 1: Quick Wins

  • Peptide Length Distribution (all platforms)
  • M/Z Distribution Histogram (all platforms)

Phase 2: Scatter Plots

  • M/Z vs Delta Mass Scatter (FragPipe, MaxQuant, quantms)
  • RT vs M/Z Scatter (FragPipe, MaxQuant, quantms)

Phase 3: Calibration Analysis

  • Mass Error Before/After Calibration (FragPipe, MaxQuant)

Phase 4: Platform-Specific

  • Enzymatic Termini Distribution (FragPipe only)
  • Extended RT Calibration plots (DIA-NN, quantms DIA mode)

Technical Notes

Proposed Plot Locations

  • Peptide Length Distribution → ms2 or identification section
  • M/Z Distribution → ms2 section
  • Scatter plots → mass_error section (M/Z vs Delta) or rt_qc section (RT vs M/Z)
  • Calibration comparison → mass_error section

Code Architecture

New common plot functions should be added to pmultiqc/modules/common/plots/:

  • draw_peptide_length_distribution()
  • draw_mz_distribution()
  • draw_mz_vs_delta_mass_scatter()
  • draw_rt_vs_mz_scatter()
  • draw_calibration_comparison()

Data extraction logic should be added to each tool's utility module.


Data Column Reference

FragPipe (psm.tsv)

  • Peptide Length - direct
  • Observed M/Z, Calculated M/Z, Calibrated Observed M/Z
  • Delta Mass
  • Retention
  • Charge
  • Number of Enzymatic Termini
  • Hyperscore, Expectation

MaxQuant (evidence.txt)

  • Sequence (calculate length)
  • m/z
  • Mass Error [Da], Mass Error [ppm], Uncalibrated Mass Error [ppm]
  • Retention time, Retention length
  • Charge
  • Score

DIA-NN (report.tsv)

  • Stripped.Sequence (calculate length)
  • Precursor.Charge, mass (calculate M/Z)
  • Ms1.Apex.Mz.Delta
  • RT, Predicted.RT, iRT
  • Precursor.Quantity

quantms (mzTab)

  • sequence (calculate length)
  • exp_mass_to_charge, calc_mass_to_charge
  • mass error (calculated)
  • retention_time
  • charge
  • search_engine_score[1]

Related

Issue Type

Suggestion

Context

No response

Additional Information

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions