Test/torch utils unit tests #206

ansschh · 2025-10-03T03:51:32Z

unit tests for torch distributed initialization utilities.

Tests Added:

suppress_output for different ranks
init_distributed with single/multi-GPU
CUDA availability and device validation
NCCL warmup and error handling
Cleanup on exceptions

- Add return type hint to get_tokenizer() - Add type hints and checkpoint validation to generate.py main() - Add parameter type hints to suppress_output() in torch/utils.py Improves IDE support and catches potential bugs early.

- Test suppress_output for different ranks - Test init_distributed with single/multi-GPU - Test CUDA availability and device validation - Test NCCL warmup and error handling - Test cleanup on exceptions

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-03T03:54:02Z

tests/test_torch_utils.py

+        mock_torch.cuda.is_available.return_value = False
+
+        with pytest.raises(RuntimeError, match="CUDA is not available"):
+            init_distributed()


Align torch utils tests with actual init_distributed behavior

The new tests around init_distributed expect the helper to raise descriptive RuntimeErrors when CUDA is unavailable and to perform cleanup on various failures (e.g. rank out of range, NCCL warm‑up errors). However, gpt_oss/torch/utils.py:init_distributed does not call torch.cuda.is_available, check torch.cuda.device_count, or wrap its warm‑up code in try/except, so it will happily return even when CUDA is mocked as unavailable and will never invoke dist.destroy_process_group. As written, test_init_distributed_cuda_not_available_raises (and the similar tests that follow) will fail because the function under test has no such error handling. Either extend init_distributed to implement these safety checks and cleanup logic or relax the assertions in these tests.

Useful? React with 👍 / 👎.

ansschh added 2 commits August 9, 2025 11:25

Add comprehensive unit tests for torch distributed utils

d7a22ec

- Test suppress_output for different ranks - Test init_distributed with single/multi-GPU - Test CUDA availability and device validation - Test NCCL warmup and error handling - Test cleanup on exceptions

chatgpt-codex-connector bot reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test/torch utils unit tests #206

Test/torch utils unit tests #206

ansschh commented Oct 3, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 3, 2025

Uh oh!

Uh oh!

Test/torch utils unit tests #206

Are you sure you want to change the base?

Test/torch utils unit tests #206

Conversation

ansschh commented Oct 3, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!