Test/tokenizer unit tests #205

ansschh · 2025-10-03T03:51:18Z

unit tests for tokenizer encoding/decoding functionality.

Tests Added:

Basic encoding/decoding
Harmony special token handling
Round-trip consistency
Edge cases and error handling
Reserved token range validation

- Add return type hint to get_tokenizer() - Add type hints and checkpoint validation to generate.py main() - Add parameter type hints to suppress_output() in torch/utils.py Improves IDE support and catches potential bugs early.

- Add CUDA availability check before device initialization - Validate rank against available CUDA device count - Add device accessibility testing with clear error messages - Add error handling for distributed communication setup - Add cleanup for failed distributed process group initialization - Provide helpful error messages with troubleshooting guidance This prevents cryptic CUDA errors and provides clear feedback when: - CUDA is not available - Invalid device rank is specified - Device access fails - Distributed communication fails

- Test basic encoding/decoding functionality - Test Harmony special token handling - Test round-trip consistency - Test edge cases and error handling - Verify reserved token range

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

chatgpt-codex-connector · 2025-10-03T03:53:17Z

gpt_oss/generate.py

+def main(args: argparse.Namespace) -> None:
+    # Validate checkpoint path exists
+    checkpoint_path = Path(args.checkpoint)
+    if not checkpoint_path.exists():
+        raise FileNotFoundError(f"Checkpoint path does not exist: {args.checkpoint}")


Allow non-local checkpoints for vLLM backend

The new preflight check unconditionally wraps args.checkpoint in Path and raises FileNotFoundError when the path does not exist. This happens before the backend switch, so it also triggers when using the vLLM backend, which previously accepted HuggingFace model identifiers or other non-local URIs and downloaded weights on demand. With the change, any string that is not an existing filesystem path now fails before vLLM can handle it, breaking valid use cases such as --backend vllm --checkpoint mistralai/Mixtral-8x7B. Consider moving the existence check inside the torch/triton branches or restricting it to local backends only.

Useful? React with 👍 / 👎.

ansschh added 3 commits August 9, 2025 11:25

Add comprehensive unit tests for tokenizer

17bef25

- Test basic encoding/decoding functionality - Test Harmony special token handling - Test round-trip consistency - Test edge cases and error handling - Verify reserved token range

chatgpt-codex-connector bot reviewed Oct 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Test/tokenizer unit tests #205

Test/tokenizer unit tests #205

ansschh commented Oct 3, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 3, 2025

Uh oh!

Uh oh!

Test/tokenizer unit tests #205

Are you sure you want to change the base?

Test/tokenizer unit tests #205

Conversation

ansschh commented Oct 3, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!