Skip to content

Add requeue command#103

Merged
nefrathenrici merged 2 commits intomasterfrom
ne/requeue
Jan 28, 2026
Merged

Add requeue command#103
nefrathenrici merged 2 commits intomasterfrom
ne/requeue

Conversation

@nefrathenrici
Copy link
Member

@nefrathenrici nefrathenrici commented Jan 21, 2026

Overview

This PR adds rqrun, a new tool that automatically handles job requeueing/retry logic for SLURM (sbatch) and PBS (qsub) schedulers. The tool wraps user scripts with signal handling to automatically resubmit jobs when they receive timeout or preemption signals. The number of retries can be controlled by an environment variable, RQ_RETRY_LIMIT

How It Works

  1. User calls rqrun sbatch ... script.sh or rqrun qsub ... script.sh
  2. rqrun creates a wrapper script that:
    • Traps timeout/preemption signals (SIGUSR1 for Slurm, TERM for PBS)
    • Runs the user's original script
    • Automatically resubmits the job when signals are received
    • Tracks retry attempts via environment variables
  3. The wrapper script is submitted to the scheduler
  4. If the job times out or is preempted, the signal is intercepted and the job is resubmitted.
  5. Process continues until the job completes successfully or retry limit is reached

@nefrathenrici nefrathenrici force-pushed the ne/requeue branch 16 times, most recently from d6d0a8d to 3857614 Compare January 23, 2026 20:06
@nefrathenrici nefrathenrici merged commit 18034f2 into master Jan 28, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants