Skip to content

Add input files metrics to Seqera executor task submission#6790

Draft
pditommaso wants to merge 2 commits intoschedfrom
input-files-metrics
Draft

Add input files metrics to Seqera executor task submission#6790
pditommaso wants to merge 2 commits intoschedfrom
input-files-metrics

Conversation

@pditommaso
Copy link
Member

Summary

  • Add InputFilesComputer to compute file metrics from task input files
  • Modify SeqeraBatchSubmitter for async metrics computation using a dedicated thread pool
  • Include input file statistics in task submission payload to Sched API

Changes

File Change
InputFilesComputer.groovy New class to compute file count, total size, and size distribution
SeqeraBatchSubmitter.groovy Async metrics computation with configurable timeout
InputFilesComputerTest.groovy Unit tests for the computer class
build.gradle Update sched-client to 0.16.0

Metrics Payload Structure

{
  "inputFilesMetrics": {
    "count": 12,
    "totalBytes": 4500000000,
    "bins": [
      {"range": "<=1MB", "count": 2},
      {"range": "<=10MB", "count": 5},
      {"range": "<=100MB", "count": 3},
      {"range": "<=1GB", "count": 2},
      {"range": ">1GB", "count": 0}
    ]
  }
}

Configuration

Variable Default Description
NXF_SEQERA_METRICS_TIMEOUT 30 sec Timeout for metrics computation

Test plan

  • Unit tests for InputFilesComputer
  • Integration test with actual Sched API

🤖 Generated with Claude Code

pditommaso and others added 2 commits January 30, 2026 12:56
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
This change adds telemetry about input files (count, total size, and size
distribution) to the task submission payload sent to the Sched API.

Key changes:
- Add InputFilesComputer to compute file metrics from TaskRun.inputFiles
  - Follows symlinks and recursively computes directory sizes
  - Logs warnings on access failures, graceful degradation
- Modify SeqeraBatchSubmitter for async metrics computation:
  - Uses dedicated thread pool (max 10 threads) for parallel computation
  - Computation starts at enqueue(), resolved at flush time
  - Configurable timeout via NXF_SEQERA_METRICS_TIMEOUT (default 30s)
- Update sched-client dependency to 0.16.0

The metrics payload structure:
{
  "inputFilesMetrics": {
    "count": 12,
    "totalBytes": 4500000000,
    "bins": [
      {"range": "<=1MB", "count": 2},
      {"range": "<=10MB", "count": 5},
      ...
    ]
  }
}

Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant