[STABLE ABI] Port forced_align #4079

pearu · 2025-09-01T14:50:09Z

As in the title.

This PR is marked as Draft as it requires the torch PR stack starting at pytorch/pytorch#161891 (read: CI failures are expected).

Same as #4022 but using C++ torch::stable::Tensor API instead of C shim Tensor API and this PR includes also the GPU port.

PLEASE NOTE THAT THE TORCHAUDIO REPOSITORY IS NO LONGER ACTIVELY MONITORED. You may not get a response. For open discussions, visit https://discuss.pytorch.org/.

pytorch-bot · 2025-09-01T14:50:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4079

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Long queue for ROCM runners, also B200 and XPU queueing is observed

✅ No Failures

As of commit b308882 with merge base 3b0e7a6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

samanklesaria

Looks good to me!

src/libtorchaudio/forced_align/cpu/compute.cpp

src/libtorchaudio/forced_align/gpu/compute.cu

src/libtorchaudio/stable/ops.h

NicolasHug

Thanks a lot for the PR @pearu .

Agreed to move forward with this, and let's try to upstream the components in libtorchaudio/stable as much as possible.

NicolasHug · 2025-11-06T10:26:43Z

src/libtorchaudio/forced_align/cpu/compute.cpp

+        STD_TORCH_CHECK(
+            targets.size(1) == torchaudio::util::max<index_t>(targetLengths),
+            "target length mismatch");
+      });


On the above, can you help me understand why we need to use STABLE_DISPATCH_INDEX_TYPES now? Basically, which part of at::max(inputLengths).item().toInt() do we need to work-around? Is this a permanent workaround, or something we'll be able to simplify in the future when more APIs are ported to the stable part?

Currently, targetLengths can be a tensor with int32 or int64 items. If one uses, say, torchaudio::util::max<int32_t>(targetLengths) when targetLengths dtype is int64, the result of max may be wrong. The usage of STABLE_DISPATCH_INDEX_TYPES makes calling max safe.

We can simplify this in future after we are able to rewrite at::max(inputLengths).item().toInt() using stable ABI methods.

meta-cla bot added the CLA Signed label Sep 1, 2025

pearu mentioned this pull request Sep 1, 2025

[STABLE ABI] Stable forced_align on cpu #4022

Open

pearu linked an issue Sep 2, 2025 that may be closed by this pull request

[STABLE ABI] Porting forced_align #4078

Open

5 tasks

pearu added this to the 2.10 milestone Sep 2, 2025

pearu force-pushed the pearu/stable-abi-forced_align branch from b3a5a0e to 282bd7f Compare September 18, 2025 16:46

pearu marked this pull request as ready for review September 18, 2025 18:28

pearu requested a review from a team as a code owner September 18, 2025 18:28

pearu requested a review from samanklesaria September 18, 2025 18:50

samanklesaria approved these changes Sep 18, 2025

View reviewed changes