Merged
Conversation
…ut an invalid stinrg that isnt parsable as they include special characters that need escaping with \
…with correct escaping character \, passes all the unit tests now
Rouzbehat78
reviewed
Apr 20, 2026
Contributor
There was a problem hiding this comment.
I did add few edge cases in the pytests where the tool_calls_to_pythonic converter was outputting invalid strings that arent parsable when a special character existed in the string like (", , \n, etc)
I also pushed the small fix, so we use json dumps for those cases that will make sure it's parsable stings.
| cache_dataset: bool = False | ||
| # Optional preprocessing function: takes Ray Dataset, returns Ray Dataset | ||
| # Applied before validation - use for custom filtering, transforms, joins, etc. | ||
| preprocess_fn: Callable | None = field(default=None, repr=False) |
Contributor
There was a problem hiding this comment.
Are we using this anywhere?
Collaborator
Author
There was a problem hiding this comment.
Agents sometime use it but it's relatively harmless alone
Collaborator
Author
|
Lgtm, just approve |
Rouzbehat78
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds tool call format guardrails to the training pipeline.
Pipeline integration is light. The normalization runs as a Ray .map() step between column normalization and filtering — same pattern as existing transforms. Existing files got ~10 lines each. model_name flows from the YAML config through DatasetLoader so the pipeline knows which model family to validate against.
Also re-enables preprocess_fn — users can now specify preprocess_fn: "my_module.my_func" in their YAML config for custom dataset transforms.