[GENERAL] New metrics based on FragPipe feedback

### Issue Description

# Feature Request: Add new QC metrics inspired by FragPipe generate_reports_pdf.py
 
## Summary
 
Based on analysis of FragPipe's `generate_reports_pdf.py` QC script (suggested by the FragPipe development team), we have identified several valuable QC metrics that could be added to pmultiqc. This issue documents which metrics can be implemented for each supported platform (FragPipe, MaxQuant, DIA-NN, quantms).
 
**Reference:** https://github.com/Nesvilab/FragPipe/blob/develop/tools/generate_reports_pdf.py
 
---
 
## New QC Metrics Overview
 
### Metrics Already in pmultiqc
- ✅ Charge Distribution
- ✅ Missed Cleavages
- ✅ Delta Mass (Da/ppm)
- ✅ PSM/Peptide/Protein Counts
- ✅ IDs over RT
- ✅ Peptide Intensity
- ✅ Search Engine Scores
- ✅ Modifications
- ✅ Contaminants
- ✅ Heatmap Summary
 
### Metrics NOT Yet in pmultiqc (Proposed)
| Metric | Description |
|--------|-------------|
| Peptide Length Distribution | Histogram showing peptide length distribution |
| M/Z Distribution | Histogram of precursor m/z values |
| MS1/MS2 Mass Error Before/After Calibration | Grouped comparison of calibration effect |
| M/Z vs. Delta Mass Scatter | 2D scatter revealing m/z-dependent mass accuracy |
| RT vs. M/Z Scatter | 2D scatter showing chromatographic separation |
| Percolator Feature Weights | ML feature importance visualization |
| Number of Enzymatic Termini | Distribution of fully/semi-tryptic peptides |
| RT Calibration Quality | RT prediction accuracy plots |
 
---
 
## Implementation Feasibility by Platform
 
### Tier 1: Implement for ALL Platforms (High Priority)
 
#### 1. Peptide Length Distribution
| Platform | Data Source | Column/Calculation | Status |
|----------|-------------|-------------------|--------|
| FragPipe | `psm.tsv` | `Peptide Length` (direct) | ✅ Ready |
| MaxQuant | `evidence.txt` | `len(Sequence)` | ✅ Ready |
| DIA-NN | `report.tsv` | `len(Stripped.Sequence)` | ✅ Ready |
| quantms | `mzTab` | `len(sequence)` | ✅ Ready |
 
**Value:** Reveals digestion efficiency, identifies degradation or incomplete digestion
 
#### 2. M/Z Distribution Histogram
| Platform | Data Source | Column | Status |
|----------|-------------|--------|--------|
| FragPipe | `psm.tsv` | `Observed M/Z` or `Calculated M/Z` | ✅ Ready |
| MaxQuant | `evidence.txt` | `m/z` | ✅ Ready |
| DIA-NN | `report.tsv` | Calculate from mass/charge | ⚠️ Needs calculation |
| quantms | `mzTab` | `exp_mass_to_charge` | ✅ Ready |
 
**Value:** Shows MS sampling range, identifies instrument biases
 
---
 
### Tier 2: Implement for MOST Platforms (Medium Priority)
 
#### 3. M/Z vs. Delta Mass Scatter Plot
| Platform | X-axis | Y-axis | Status |
|----------|--------|--------|--------|
| FragPipe | `Calculated M/Z` | `Delta Mass` | ✅ Ready |
| MaxQuant | `m/z` | `Mass Error [ppm]` | ✅ Ready |
| DIA-NN | Calculate M/Z | `Ms1.Apex.Mz.Delta` | ⚠️ Needs M/Z calc |
| quantms | `calc_mass_to_charge` | ppm error | ✅ Ready |
 
**Value:** Reveals m/z-dependent mass accuracy trends
 
#### 4. RT vs. M/Z Scatter Plot
| Platform | X-axis | Y-axis | Status |
|----------|--------|--------|--------|
| FragPipe | `Retention` | `Calculated M/Z` | ✅ Ready |
| MaxQuant | `Retention time` | `m/z` | ✅ Ready |
| DIA-NN | `RT` | Calculate M/Z | ⚠️ Needs M/Z calc |
| quantms | `retention_time` | `exp_mass_to_charge` | ✅ Ready |
 
**Value:** Shows chromatographic separation efficiency
 
---
 
### Tier 3: Platform-Specific (Medium Priority)
 
#### 5. Mass Error Before/After Calibration
| Platform | Before Column | After Column | Status |
|----------|---------------|--------------|--------|
| FragPipe | `Observed M/Z - Calculated M/Z` | `Calibrated Observed M/Z - Calculated M/Z` | ✅ Ready |
| MaxQuant | `Uncalibrated Mass Error [ppm]` | `Mass Error [ppm]` | ✅ Ready |
| DIA-NN | ❌ Not available | - | Not feasible |
| quantms | ❌ Not available | - | Not feasible |
 
**Value:** Demonstrates calibration effectiveness, critical for QC
 
#### 6. Number of Enzymatic Termini Distribution
| Platform | Data Source | Status |
|----------|-------------|--------|
| FragPipe | `psm.tsv` → `Number of Enzymatic Termini` | ✅ Ready |
| MaxQuant | Not directly available | ❌ Not feasible |
| DIA-NN | Not available | ❌ Not feasible |
| quantms | Not available | ❌ Not feasible |
 
**Value:** Shows enzyme specificity (fully vs semi-tryptic)
 
---
 
### Tier 4: Advanced/Future (Lower Priority)
 
#### 7. Percolator Feature Weights
| Platform | Status |
|----------|--------|
| FragPipe | ⚠️ Requires log file parsing |
| MaxQuant | ❌ Andromeda doesn't use Percolator |
| DIA-NN | ❌ Neural network not interpretable |
| quantms | ⚠️ Requires Percolator log parsing |
 
#### 8. RT Calibration Quality (extend existing)
| Platform | Status |
|----------|--------|
| FragPipe | ⚠️ Requires MSBooster output files |
| MaxQuant | ❌ No RT prediction |
| DIA-NN | ✅ Already partial (`RT` vs `Predicted.RT`) - extend |
| quantms | ✅ In DIA mode - extend |
 
---
 
## Summary: Implementation Matrix
 
| Metric | FragPipe | MaxQuant | DIA-NN | quantms | Priority |
|--------|:--------:|:--------:|:------:|:-------:|:--------:|
| Peptide Length Distribution | ✅ | ✅ | ✅ | ✅ | 🔴 HIGH |
| M/Z Distribution | ✅ | ✅ | ⚠️ | ✅ | 🔴 HIGH |
| M/Z vs Delta Mass Scatter | ✅ | ✅ | ⚠️ | ✅ | 🟡 MEDIUM |
| RT vs M/Z Scatter | ✅ | ✅ | ⚠️ | ✅ | 🟡 MEDIUM |
| Mass Error Before/After Cal | ✅ | ✅ | ❌ | ❌ | 🟡 MEDIUM |
| Enzymatic Termini | ✅ | ❌ | ❌ | ❌ | 🟢 LOW |
| RT Calibration Quality | ⚠️ | ❌ | ✅ | ✅ | 🟡 MEDIUM |
| Percolator Weights | ⚠️ | ❌ | ❌ | ⚠️ | 🟢 LOW |
 
**Legend:**
- ✅ = Fully feasible with existing data
- ⚠️ = Feasible but requires additional parsing/calculation
- ❌ = Not feasible (data not available)
 
---
 
## Recommended Implementation Order
 
### Phase 1: Quick Wins
- [ ] Peptide Length Distribution (all platforms)
- [ ] M/Z Distribution Histogram (all platforms)
 
### Phase 2: Scatter Plots
- [ ] M/Z vs Delta Mass Scatter (FragPipe, MaxQuant, quantms)
- [ ] RT vs M/Z Scatter (FragPipe, MaxQuant, quantms)
 
### Phase 3: Calibration Analysis
- [ ] Mass Error Before/After Calibration (FragPipe, MaxQuant)
 
### Phase 4: Platform-Specific
- [ ] Enzymatic Termini Distribution (FragPipe only)
- [ ] Extended RT Calibration plots (DIA-NN, quantms DIA mode)
 
---
 
## Technical Notes
 
### Proposed Plot Locations
- Peptide Length Distribution → `ms2` or `identification` section
- M/Z Distribution → `ms2` section
- Scatter plots → `mass_error` section (M/Z vs Delta) or `rt_qc` section (RT vs M/Z)
- Calibration comparison → `mass_error` section
 
### Code Architecture
New common plot functions should be added to `pmultiqc/modules/common/plots/`:
- `draw_peptide_length_distribution()`
- `draw_mz_distribution()`
- `draw_mz_vs_delta_mass_scatter()`
- `draw_rt_vs_mz_scatter()`
- `draw_calibration_comparison()`
 
Data extraction logic should be added to each tool's utility module.
 
---
 
## Data Column Reference
 
### FragPipe (`psm.tsv`)
- `Peptide Length` - direct
- `Observed M/Z`, `Calculated M/Z`, `Calibrated Observed M/Z`
- `Delta Mass`
- `Retention`
- `Charge`
- `Number of Enzymatic Termini`
- `Hyperscore`, `Expectation`
 
### MaxQuant (`evidence.txt`)
- `Sequence` (calculate length)
- `m/z`
- `Mass Error [Da]`, `Mass Error [ppm]`, `Uncalibrated Mass Error [ppm]`
- `Retention time`, `Retention length`
- `Charge`
- `Score`
 
### DIA-NN (`report.tsv`)
- `Stripped.Sequence` (calculate length)
- `Precursor.Charge`, mass (calculate M/Z)
- `Ms1.Apex.Mz.Delta`
- `RT`, `Predicted.RT`, `iRT`
- `Precursor.Quantity`
 
### quantms (`mzTab`)
- `sequence` (calculate length)
- `exp_mass_to_charge`, `calc_mass_to_charge`
- mass error (calculated)
- `retention_time`
- `charge`
- `search_engine_score[1]`
 
---
 
## Related
- FragPipe QC script: https://github.com/Nesvilab/FragPipe/blob/develop/tools/generate_reports_pdf.py

### Issue Type

Suggestion

### Context

_No response_

### Additional Information

_No response_

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Metric	Description
Peptide Length Distribution	Histogram showing peptide length distribution
M/Z Distribution	Histogram of precursor m/z values
MS1/MS2 Mass Error Before/After Calibration	Grouped comparison of calibration effect
M/Z vs. Delta Mass Scatter	2D scatter revealing m/z-dependent mass accuracy
RT vs. M/Z Scatter	2D scatter showing chromatographic separation
Percolator Feature Weights	ML feature importance visualization
Number of Enzymatic Termini	Distribution of fully/semi-tryptic peptides
RT Calibration Quality	RT prediction accuracy plots

Platform	Data Source	Column/Calculation	Status
FragPipe	`psm.tsv`	`Peptide Length` (direct)	✅ Ready
MaxQuant	`evidence.txt`	`len(Sequence)`	✅ Ready
DIA-NN	`report.tsv`	`len(Stripped.Sequence)`	✅ Ready
quantms	`mzTab`	`len(sequence)`	✅ Ready

Platform	Data Source	Column	Status
FragPipe	`psm.tsv`	`Observed M/Z` or `Calculated M/Z`	✅ Ready
MaxQuant	`evidence.txt`	`m/z`	✅ Ready
DIA-NN	`report.tsv`	Calculate from mass/charge	⚠️ Needs calculation
quantms	`mzTab`	`exp_mass_to_charge`	✅ Ready

Platform	X-axis	Y-axis	Status
FragPipe	`Calculated M/Z`	`Delta Mass`	✅ Ready
MaxQuant	`m/z`	`Mass Error [ppm]`	✅ Ready
DIA-NN	Calculate M/Z	`Ms1.Apex.Mz.Delta`	⚠️ Needs M/Z calc
quantms	`calc_mass_to_charge`	ppm error	✅ Ready

Platform	X-axis	Y-axis	Status
FragPipe	`Retention`	`Calculated M/Z`	✅ Ready
MaxQuant	`Retention time`	`m/z`	✅ Ready
DIA-NN	`RT`	Calculate M/Z	⚠️ Needs M/Z calc
quantms	`retention_time`	`exp_mass_to_charge`	✅ Ready

Platform	Before Column	After Column	Status
FragPipe	`Observed M/Z - Calculated M/Z`	`Calibrated Observed M/Z - Calculated M/Z`	✅ Ready
MaxQuant	`Uncalibrated Mass Error [ppm]`	`Mass Error [ppm]`	✅ Ready
DIA-NN	❌ Not available	-	Not feasible
quantms	❌ Not available	-	Not feasible

Platform	Data Source	Status
FragPipe	`psm.tsv` → `Number of Enzymatic Termini`	✅ Ready
MaxQuant	Not directly available	❌ Not feasible
DIA-NN	Not available	❌ Not feasible
quantms	Not available	❌ Not feasible

Platform	Status
FragPipe	⚠️ Requires log file parsing
MaxQuant	❌ Andromeda doesn't use Percolator
DIA-NN	❌ Neural network not interpretable
quantms	⚠️ Requires Percolator log parsing

Platform	Status
FragPipe	⚠️ Requires MSBooster output files
MaxQuant	❌ No RT prediction
DIA-NN	✅ Already partial (`RT` vs `Predicted.RT`) - extend
quantms	✅ In DIA mode - extend

Metric	FragPipe	MaxQuant	DIA-NN	quantms	Priority
Peptide Length Distribution	✅	✅	✅	✅	🔴 HIGH
M/Z Distribution	✅	✅	⚠️	✅	🔴 HIGH
M/Z vs Delta Mass Scatter	✅	✅	⚠️	✅	🟡 MEDIUM
RT vs M/Z Scatter	✅	✅	⚠️	✅	🟡 MEDIUM
Mass Error Before/After Cal	✅	✅	❌	❌	🟡 MEDIUM
Enzymatic Termini	✅	❌	❌	❌	🟢 LOW
RT Calibration Quality	⚠️	❌	✅	✅	🟡 MEDIUM
Percolator Weights	⚠️	❌	❌	⚠️	🟢 LOW

[GENERAL] New metrics based on FragPipe feedback #546

Description

Issue Description

Feature Request: Add new QC metrics inspired by FragPipe generate_reports_pdf.py

Summary

New QC Metrics Overview

Metrics Already in pmultiqc

Metrics NOT Yet in pmultiqc (Proposed)

Implementation Feasibility by Platform

Tier 1: Implement for ALL Platforms (High Priority)

1. Peptide Length Distribution

2. M/Z Distribution Histogram

Tier 2: Implement for MOST Platforms (Medium Priority)

3. M/Z vs. Delta Mass Scatter Plot

4. RT vs. M/Z Scatter Plot

Tier 3: Platform-Specific (Medium Priority)

5. Mass Error Before/After Calibration

6. Number of Enzymatic Termini Distribution

Tier 4: Advanced/Future (Lower Priority)

7. Percolator Feature Weights

8. RT Calibration Quality (extend existing)

Summary: Implementation Matrix

Recommended Implementation Order

Phase 1: Quick Wins

Phase 2: Scatter Plots

Phase 3: Calibration Analysis

Phase 4: Platform-Specific

Technical Notes

Proposed Plot Locations

Code Architecture

Data Column Reference

FragPipe (psm.tsv)

MaxQuant (evidence.txt)

DIA-NN (report.tsv)

quantms (mzTab)

Related

Issue Type

Context

Additional Information

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

FragPipe (`psm.tsv`)

MaxQuant (`evidence.txt`)

DIA-NN (`report.tsv`)

quantms (`mzTab`)