Skip to content

[GENERAL] Improve handling and documentation of large datasets in reports #466

@ypriverol

Description

@ypriverol

Issue Description

It may happen that for large-scale datasets + number of files/runs (e. g., 50+), current reports and metrics do not work. As studies get larger, HTML reports can become slow to render, hard to navigate, or even crash browsers. Cloud-based (public/hosted) services may also hit limits.

Proposal:

  • Profile performance on large input sets first (dataset with more than 50 runs).
  • Add documentation warnings or detection when a report is above recommended limits (runs/files, RAM, browser memory).
  • Add options for dynamic loading/pagination of report sections, splitting reports by sample or batch, showing summaries first with links to details.
  • Allow more streamlined or minimal reporting output (summary-only, or by batch/sample) as an option.
  • Provide clear error/warning messages in UI/logs when practical/reporting limits are hit.
  • Document differences between public/hosted and local deployments re: upload limits, processing time, expected failures.

Benefits:

  • Makes pmultiqc more robust for large-scale proteomics (high-throughput, facilities, population studies).
  • Improves user trust and confidence (no silent report failures or crashed browsers).
  • Enables synthesis for methods sections/outreach materials on real-world scalability, practical advice for new adopters.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions