TPU Interpret Mode: Question about on_wait behavior #31841

tengyifei · 2025-09-16T16:52:37Z

tengyifei
Sep 16, 2025

Hi JAX team (cc @jburnim ),

I'm testing custom Pallas RDMA kernels with TPU Interpret Mode (https://github.com/jax-ml/jax/blob/main/jax/experimental/pallas/g3doc/debugging.md#tpu-interpret-mode). My kernels reuse recv semaphores across multiple pipeline stages/algorithm steps (using capacity/ready semaphores to protect against overrun in the circular buffer of recv semaphores). I've noticed different behavior between DMA execution modes:

eager mode: Works correctly with semaphore reuse
on_wait mode: TPU Interpret Mode flakily reports race conditions when semaphores are reused

Looking at jax/_src/pallas/mosaic/interpret.py, I noticed this TODO around line 576:

jax/jax/_src/pallas/mosaic/interpret.py

Lines 576 to 580 in c850bef

    
           # TODO(jburnim): Fix uses of `dmas_by_sem` to align with the two lines of 
        
           # documentation above, i.e. index `dmas_by_sem` with 
        
           # `(semaphore_id, device_id)` (currently indexed with `semaphore_id only). 
        
           dmas_by_sem: dict[tuple[int, int], list[DMA]] = dataclasses.field( 
        
               default_factory=lambda: collections.defaultdict(list))

Currently, dmas_by_sem is indexed only by semaphore_id, which (IIUC) might cause DMAs from different devices/sections to share the same queue when semaphores are reused. This seems to lead to out-of-order DMA execution in on_wait mode. I'm wondering if that is WAI.

Questions:

Is this TODO related to the issues I'm seeing with semaphore reuse in on_wait mode?
Are there known limitations or false positives in TPU Interpret Mode's race detection when semaphores are reused?

Using distinct semaphores (i.e. more than the number of steps) eliminates the issues, but this significantly increases semaphore usage. Any guidance would be appreciated!

Thanks!

Answered by jburnim

Sep 16, 2025

A data race detected by TPU Interpret Mode should never be a false positive. (There are no known issues here, but it is possible there is a bug that is permitting false positives.)

I suspect that some later stage/step's RDMA is signaling a semaphore while an earlier stage/step is still waiting for an earlier RDMA to signal the same semaphore, and this is leading to a real race.

If a second RDMA is started before an earlier RDMA using the same send/receive semaphores has completed, Pallas permits the second RDMA to to complete and signal the semaphores before the first RDMA. But this will only happen in TPU Interpret Mode with dma_execution_mode="on_wait", which is why this kind of race is…

View full answer

jburnim · 2025-09-16T23:43:03Z

jburnim
Sep 16, 2025
Collaborator

A data race detected by TPU Interpret Mode should never be a false positive. (There are no known issues here, but it is possible there is a bug that is permitting false positives.)

I suspect that some later stage/step's RDMA is signaling a semaphore while an earlier stage/step is still waiting for an earlier RDMA to signal the same semaphore, and this is leading to a real race.

If a second RDMA is started before an earlier RDMA using the same send/receive semaphores has completed, Pallas permits the second RDMA to to complete and signal the semaphores before the first RDMA. But this will only happen in TPU Interpret Mode with dma_execution_mode="on_wait", which is why this kind of race is much more likely to happen with on_wait instead of eager.

(That TODO is related only in that: (a) dmas_by_sem is one of the structures used to delay executing RDMAs and to execute them out of order, and (b) that TODO is for an improvement to the structure that will allow delaying some RDMAs even more.)

1 reply

tengyifei Sep 18, 2025
Author

Thanks! I found the root cause for the race condition and it was indeed due to 2 RMDAs trampling on the same receive semaphore, which itself was caused by improper management of the read/capacity semaphores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TPU Interpret Mode: Question about on_wait behavior #31841

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

TPU Interpret Mode: Question about on_wait behavior #31841

Uh oh!

Uh oh!

tengyifei Sep 16, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

jburnim Sep 16, 2025 Collaborator

Uh oh!

tengyifei Sep 18, 2025 Author

tengyifei
Sep 16, 2025

Replies: 1 comment 1 reply

jburnim
Sep 16, 2025
Collaborator

tengyifei Sep 18, 2025
Author