Director-v2 restart a computational job if it loses/reconnects to a private cluster for a short time

## Scenario
1. a Computational pipeline is started on a private cluster
2. the pipeline is scheduled on the private cluster, e.g. the task status is set to `STARTED`
3. for a short time the dask-scheduler on the private cluster is not reachable,
4. the dv-2 checks during that time for the task status via its dask client, fails to connect returns `UNKNOWN`
5. dv-2 sets the task back to `WAITING_FOR_CLUSTER`,
6. on the next iteration of the scheduler the dask-scheduler is reachable again,

--> dv-2 does not check if the task is already running and starts the task again, cause it does not check for that use-case
--> the private cluster runs the task twice, potentially running the task longer than needed, wasting time and money.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Director-v2 restart a computational job if it loses/reconnects to a private cluster for a short time #6793

Scenario

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Director-v2 restart a computational job if it loses/reconnects to a private cluster for a short time #6793

Description

Scenario

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions