Add direct XYWH-CXCYWH conversion for better performance by raimbekovm · Pull Request #9326 · pytorch/vision

raimbekovm · 2026-01-05T16:57:50Z

Summary

Implements direct conversion between XYWH and CXCYWH bounding box formats, removing the need for intermediate XYXY conversion. This resolves the TODO comment at line 307 in _meta.py.

Performance improvement: ~2.3x faster for XYWH↔CXCYWH conversions.

Changes

Added _xywh_to_cxcywh() function for direct XYWH → CXCYWH conversion
Added _cxcywh_to_xywh() function for direct CXCYWH → XYWH conversion
Updated _convert_bounding_box_format() to use direct conversion when possible
Removed the TODO comment

Both functions correctly handle integer tensors using the same rounding behavior as the two-step conversion path.

pytorch-bot · 2026-01-05T16:57:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9326

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 8517333 with merge base ec4f95a ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

CMake / linux (linux.g5.4xlarge.nvidia.gpu, cuda, 12.6) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zy1git

Thanks for the PR. I left a comment for a minor thing. Feel free to take a look.

torchvision/transforms/v2/functional/_meta.py

zy1git · 2026-01-14T01:40:23Z

@raimbekovm Thanks a lot for this PR. Could you please add the parity test against reference implementation (the implementation which needs for intermediate XYXY conversion)?

Also, in your description, you mentioned that "Performance improvement: ~2.3x faster for XYWH↔CXCYWH conversions.". Could you please share benchmark script or benchmark configuration (shapes etc.) to show how you get 2.3x faster?

And I also notice that you submitted PR#9322. If that PR is correct (I am still reviewing it), the similar fix can be applied here for the function _cxcywh_to_xywh.

raimbekovm · 2026-01-14T14:23:18Z

@zy1git Thanks for the review! Here's the benchmark results and script.

Benchmark Results

num_boxes	dtype	conversion	direct	two-step	speedup
100	float32	xywh_to_cxcywh	0.0043ms	0.0095ms	2.24x
100	float32	cxcywh_to_xywh	0.0043ms	0.0088ms	2.04x
1,000	float32	xywh_to_cxcywh	0.0059ms	0.0135ms	2.29x
1,000	float32	cxcywh_to_xywh	0.0059ms	0.0126ms	2.15x
10,000	float32	xywh_to_cxcywh	0.0333ms	0.0778ms	2.34x
10,000	float32	cxcywh_to_xywh	0.0470ms	0.0885ms	1.88x
100,000	float32	xywh_to_cxcywh	0.2155ms	0.3804ms	1.77x
100,000	float32	cxcywh_to_xywh	0.1969ms	0.3562ms	1.81x
1,000	int64	xywh_to_cxcywh	0.0085ms	0.0176ms	2.06x
1,000	int64	cxcywh_to_xywh	0.0100ms	0.0177ms	1.76x
10,000	int64	xywh_to_cxcywh	0.0741ms	0.1192ms	1.61x
10,000	int64	cxcywh_to_xywh	0.0808ms	0.1229ms	1.52x

Average speedup: ~1.9x (ranges from 1.5x to 2.3x depending on tensor size and dtype)

The speedup comes from avoiding:

Intermediate XYXY tensor allocation
Extra arithmetic operations in the two-step path

Benchmark Script

import torch
import time

# Conversion functions from torchvision/transforms/v2/functional/_meta.py
def _xywh_to_xyxy(xywh, inplace):
    xyxy = xywh if inplace else xywh.clone()
    xyxy[..., 2:] += xyxy[..., :2]
    return xyxy

def _xyxy_to_cxcywh(xyxy, inplace):
    if not inplace:
        xyxy = xyxy.clone()
    xyxy[..., 2:].sub_(xyxy[..., :2])
    xyxy[..., :2].mul_(2).add_(xyxy[..., 2:]).div_(2, rounding_mode=None if xyxy.is_floating_point() else "floor")
    return xyxy

def _cxcywh_to_xyxy(cxcywh, inplace):
    if not inplace:
        cxcywh = cxcywh.clone()
    half_wh = cxcywh[..., 2:].div(-2, rounding_mode=None if cxcywh.is_floating_point() else "floor").abs_()
    cxcywh[..., :2].sub_(half_wh)
    cxcywh[..., 2:].add_(cxcywh[..., :2])
    return cxcywh

def _xyxy_to_xywh(xyxy, inplace):
    xywh = xyxy if inplace else xyxy.clone()
    xywh[..., 2:] -= xywh[..., :2]
    return xywh

# New direct conversion functions
def _xywh_to_cxcywh(xywh, inplace):
    if not inplace:
        xywh = xywh.clone()
    xywh[..., :2].add_(xywh[..., 2:].div(2, rounding_mode=None if xywh.is_floating_point() else "floor"))
    return xywh

def _cxcywh_to_xywh(cxcywh, inplace):
    if not inplace:
        cxcywh = cxcywh.clone()
    half_wh = cxcywh[..., 2:].div(-2, rounding_mode=None if cxcywh.is_floating_point() else "floor").abs_()
    cxcywh[..., :2].sub_(half_wh)
    return cxcywh

def benchmark(func, data, warmup=10, iterations=100):
    for _ in range(warmup):
        func(data.clone(), inplace=False)
    start = time.perf_counter()
    for _ in range(iterations):
        func(data.clone(), inplace=False)
    return (time.perf_counter() - start) / iterations * 1000

# Run benchmark
for num_boxes, dtype in [(100, torch.float32), (1000, torch.float32), (10000, torch.float32), (1000, torch.int64)]:
    data = torch.rand(num_boxes, 4, dtype=dtype) * 1000 if dtype.is_floating_point else torch.randint(0, 1000, (num_boxes, 4), dtype=dtype)
    direct = benchmark(lambda x, i: _xywh_to_cxcywh(x, i), data)
    two_step = benchmark(lambda x, i: _xyxy_to_cxcywh(_xywh_to_xyxy(x, False), False), data)
    print(f"{num_boxes} boxes, {dtype}: direct={direct:.4f}ms, two-step={two_step:.4f}ms, speedup={two_step/direct:.2f}x")

I've also added the parity test in the latest commit (992ef20).

Regarding PR #9322 - I'll apply the similar fix to _cxcywh_to_xywh once that PR is merged.

zy1git

left a comment on the test.

zy1git · 2026-01-22T08:45:07Z

test/test_transforms_v2.py

+        ("old_format", "new_format"),
+        [
+            (tv_tensors.BoundingBoxFormat.XYWH, tv_tensors.BoundingBoxFormat.CXCYWH),
+            (tv_tensors.BoundingBoxFormat.CXCYWH, tv_tensors.BoundingBoxFormat.XYWH),
+        ],
+    )


The "new_format" is not passed to the test function. We can remove it, correct?

raimbekovm · 2026-01-22T14:56:23Z

Good point on both!

_xywh_to_cxcywh doesn't need half_wh since it's just a simple div(2).add() — no ceil trick like in _cxcywh_to_xywh. Let me know if you'd still prefer it for consistency.
You're right, new_format is unused — I'll remove it.

zy1git · 2026-01-23T09:31:17Z

@raimbekovm The new commit looks great! Thank you for your contribution!

However, it seems that the CI test fails because of a Lint problem. The error is that The ufmt formatter expects lines 184-186 of the _meta.py to be a single line instead of being split across three lines.

The thing is that you don't need to manually fix this. If you follow the contributing guide to set up your env, when you commit, the ufmt will do the linting fix automatically. Could you try that?

raimbekovm · 2026-01-23T10:04:32Z

Fixed! Thanks for the hint about ufmt.

Add direct XYWH-CXCYWH conversion for better performance

879dbef

meta-cla bot added the cla signed label Jan 5, 2026

zy1git reviewed Jan 12, 2026

View reviewed changes

torchvision/transforms/v2/functional/_meta.py Outdated Show resolved Hide resolved

Add parity test for direct XYWH-CXCYWH conversion

992ef20

Merge branch 'main' into feat/direct-xywh-cxcywh-conversion

36b4274

zy1git reviewed Jan 22, 2026

View reviewed changes

raimbekovm and others added 2 commits January 22, 2026 21:00

Remove unused new_format parameter in parity test

1937733

Merge branch 'main' into feat/direct-xywh-cxcywh-conversion

971b047

Fix ufmt formatting

ad581d8

Merge branch 'main' into feat/direct-xywh-cxcywh-conversion

afce545

zy1git approved these changes Jan 27, 2026

View reviewed changes

Merge branch 'main' into feat/direct-xywh-cxcywh-conversion

8517333

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add direct XYWH-CXCYWH conversion for better performance#9326

Add direct XYWH-CXCYWH conversion for better performance#9326
raimbekovm wants to merge 8 commits intopytorch:mainfrom
raimbekovm:feat/direct-xywh-cxcywh-conversion

raimbekovm commented Jan 5, 2026

Uh oh!

pytorch-bot bot commented Jan 5, 2026 •

edited

Loading

Uh oh!

zy1git left a comment

Uh oh!

Uh oh!

zy1git commented Jan 14, 2026

Uh oh!

raimbekovm commented Jan 14, 2026

Uh oh!

zy1git left a comment

Uh oh!

zy1git Jan 22, 2026

Uh oh!

raimbekovm commented Jan 22, 2026

Uh oh!

zy1git commented Jan 23, 2026 •

edited

Loading

Uh oh!

raimbekovm commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raimbekovm commented Jan 5, 2026

Summary

Changes

Uh oh!

pytorch-bot bot commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9326

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

zy1git left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zy1git commented Jan 14, 2026

Uh oh!

raimbekovm commented Jan 14, 2026

Benchmark Results

Uh oh!

zy1git left a comment

Choose a reason for hiding this comment

Uh oh!

zy1git Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

raimbekovm commented Jan 22, 2026

Uh oh!

zy1git commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raimbekovm commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Jan 5, 2026 •

edited

Loading

zy1git commented Jan 23, 2026 •

edited

Loading