Skip to content

Commit 43e13b9

Browse files
committed
jefe: fix comparison inversion in timeout handling
We recently taught Jefe to impose a restart delay before standing back up a crashing task. However, the implementation contained a logic error that could manifest in certain corner cases: it would actually restart the task _the first time it considered it,_ but only if the deadline hadn't actually elapsed yet. Once the deadline elapsed, Jefe would enter an infinite loop of setting its own timer into the past and responding to the notification, never yielding the CPU. It turns out this is relatively easy to trigger by putting a task into a fast crashloop -- I hit this when forcing the I2C driver to crash while chasing an unrelated bug. This commit flips the timestamp comparison logic to implement the restart timeout for reals, causing Jefe to tolerate fast crashloops without taking the system down.
1 parent 7b66c22 commit 43e13b9

File tree

1 file changed

+9
-2
lines changed

1 file changed

+9
-2
lines changed

task/jefe/src/main.rs

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -358,19 +358,26 @@ impl idol_runtime::NotificationHandler for ServerImpl<'_> {
358358
external::check(self.task_states, now);
359359

360360
if bits.has_timer_fired(notifications::TIMER_MASK) {
361-
// If our timer went off, we need to reestablish it
361+
// If our timer went off, we need to reestablish it. Compute a
362+
// baseline deadline, which will be adjusted _down_ below when
363+
// processing tasks, if necessary.
362364
if now >= self.deadline {
363365
self.deadline = now.wrapping_add(u64::from(TIMER_INTERVAL));
364366
}
367+
365368
// Check for tasks in timeout, updating our timer deadline
366369
if core::mem::take(&mut self.any_tasks_in_timeout) {
367370
for (index, status) in self.task_states.iter_mut().enumerate() {
368371
if let TaskState::Timeout { restart_at } = &status.state {
369-
if *restart_at >= now {
372+
if *restart_at <= now {
373+
// This deadline has elapsed, go ahead and stand it
374+
// back up.
370375
kipc::reinit_task(index, true);
371376
status.state =
372377
TaskState::Running { started_at: now };
373378
} else {
379+
// This deadline remains in the future, min it into
380+
// our next wake time.
374381
self.any_tasks_in_timeout = true;
375382
self.deadline = self.deadline.min(*restart_at);
376383
}

0 commit comments

Comments
 (0)