Add direct XYWH-CXCYWH conversion for better performance#9326
Add direct XYWH-CXCYWH conversion for better performance#9326raimbekovm wants to merge 8 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/9326
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 8517333 with merge base ec4f95a ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
zy1git
left a comment
There was a problem hiding this comment.
Thanks for the PR. I left a comment for a minor thing. Feel free to take a look.
|
@raimbekovm Thanks a lot for this PR. Could you please add the parity test against reference implementation (the implementation which needs for intermediate XYXY conversion)? Also, in your description, you mentioned that "Performance improvement: ~2.3x faster for XYWH↔CXCYWH conversions.". Could you please share benchmark script or benchmark configuration (shapes etc.) to show how you get 2.3x faster? And I also notice that you submitted PR#9322. If that PR is correct (I am still reviewing it), the similar fix can be applied here for the function |
|
@zy1git Thanks for the review! Here's the benchmark results and script. Benchmark Results
Average speedup: ~1.9x (ranges from 1.5x to 2.3x depending on tensor size and dtype) The speedup comes from avoiding:
Benchmark Scriptimport torch
import time
# Conversion functions from torchvision/transforms/v2/functional/_meta.py
def _xywh_to_xyxy(xywh, inplace):
xyxy = xywh if inplace else xywh.clone()
xyxy[..., 2:] += xyxy[..., :2]
return xyxy
def _xyxy_to_cxcywh(xyxy, inplace):
if not inplace:
xyxy = xyxy.clone()
xyxy[..., 2:].sub_(xyxy[..., :2])
xyxy[..., :2].mul_(2).add_(xyxy[..., 2:]).div_(2, rounding_mode=None if xyxy.is_floating_point() else "floor")
return xyxy
def _cxcywh_to_xyxy(cxcywh, inplace):
if not inplace:
cxcywh = cxcywh.clone()
half_wh = cxcywh[..., 2:].div(-2, rounding_mode=None if cxcywh.is_floating_point() else "floor").abs_()
cxcywh[..., :2].sub_(half_wh)
cxcywh[..., 2:].add_(cxcywh[..., :2])
return cxcywh
def _xyxy_to_xywh(xyxy, inplace):
xywh = xyxy if inplace else xyxy.clone()
xywh[..., 2:] -= xywh[..., :2]
return xywh
# New direct conversion functions
def _xywh_to_cxcywh(xywh, inplace):
if not inplace:
xywh = xywh.clone()
xywh[..., :2].add_(xywh[..., 2:].div(2, rounding_mode=None if xywh.is_floating_point() else "floor"))
return xywh
def _cxcywh_to_xywh(cxcywh, inplace):
if not inplace:
cxcywh = cxcywh.clone()
half_wh = cxcywh[..., 2:].div(-2, rounding_mode=None if cxcywh.is_floating_point() else "floor").abs_()
cxcywh[..., :2].sub_(half_wh)
return cxcywh
def benchmark(func, data, warmup=10, iterations=100):
for _ in range(warmup):
func(data.clone(), inplace=False)
start = time.perf_counter()
for _ in range(iterations):
func(data.clone(), inplace=False)
return (time.perf_counter() - start) / iterations * 1000
# Run benchmark
for num_boxes, dtype in [(100, torch.float32), (1000, torch.float32), (10000, torch.float32), (1000, torch.int64)]:
data = torch.rand(num_boxes, 4, dtype=dtype) * 1000 if dtype.is_floating_point else torch.randint(0, 1000, (num_boxes, 4), dtype=dtype)
direct = benchmark(lambda x, i: _xywh_to_cxcywh(x, i), data)
two_step = benchmark(lambda x, i: _xyxy_to_cxcywh(_xywh_to_xyxy(x, False), False), data)
print(f"{num_boxes} boxes, {dtype}: direct={direct:.4f}ms, two-step={two_step:.4f}ms, speedup={two_step/direct:.2f}x")I've also added the parity test in the latest commit (992ef20). Regarding PR #9322 - I'll apply the similar fix to |
zy1git
left a comment
There was a problem hiding this comment.
left a comment on the test.
test/test_transforms_v2.py
Outdated
| ("old_format", "new_format"), | ||
| [ | ||
| (tv_tensors.BoundingBoxFormat.XYWH, tv_tensors.BoundingBoxFormat.CXCYWH), | ||
| (tv_tensors.BoundingBoxFormat.CXCYWH, tv_tensors.BoundingBoxFormat.XYWH), | ||
| ], | ||
| ) |
There was a problem hiding this comment.
The "new_format" is not passed to the test function. We can remove it, correct?
|
Good point on both!
|
|
@raimbekovm The new commit looks great! Thank you for your contribution! However, it seems that the CI test fails because of a Lint problem. The error is that The ufmt formatter expects lines 184-186 of the _meta.py to be a single line instead of being split across three lines. The thing is that you don't need to manually fix this. If you follow the contributing guide to set up your env, when you commit, the ufmt will do the linting fix automatically. Could you try that? |
|
Fixed! Thanks for the hint about ufmt. |
Summary
Implements direct conversion between XYWH and CXCYWH bounding box formats, removing the need for intermediate XYXY conversion. This resolves the TODO comment at line 307 in
_meta.py.Performance improvement: ~2.3x faster for XYWH↔CXCYWH conversions.
Changes
_xywh_to_cxcywh()function for direct XYWH → CXCYWH conversion_cxcywh_to_xywh()function for direct CXCYWH → XYWH conversion_convert_bounding_box_format()to use direct conversion when possibleBoth functions correctly handle integer tensors using the same rounding behavior as the two-step conversion path.