Skip to content

Conversation

@VijayVignesh1
Copy link
Contributor

@VijayVignesh1 VijayVignesh1 commented Oct 24, 2025

Before submitting
  • Was this discussed/agreed via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes #317

PR review

Added support for multisample item.

Basically added a sample_count parameter which creates a batch of sub samples for each sample, given a single transform function.

Note:
Multi-sample behavior applies only when the transform is passed to the
StreamingDataset constructor (i.e., via the `transform` argument),
and not when overriding `__init__` in this subclass. 

Sample code:

    def transform_fn_sq(x, sample_idx, *args, **kwargs):
        """A simple transform function that doubles the input."""
        return x * sample_idx

    dataset = StreamingDataset(
        data_dir,
        cache_dir=str(cache_dir),
        shuffle=False,
        transform=[transform_fn_sq],
        sample_count=3,
    )

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in GitHub issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@VijayVignesh1 VijayVignesh1 force-pushed the feature/add_multisample_support branch from 1b01b6f to 6a77302 Compare October 24, 2025 20:12
@VijayVignesh1
Copy link
Contributor Author

@tchaton @deependujha @bhimrazy Can you verify the approach once? I can then make changes to the README.

@codecov
Copy link

codecov bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 84.21053% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 80%. Comparing base (b070032) to head (229ff5b).

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #740   +/-   ##
===================================
- Coverage    80%    80%   -0%     
===================================
  Files        52     52           
  Lines      7343   7357   +14     
===================================
- Hits       5885   5876    -9     
- Misses     1458   1481   +23     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VijayVignesh1 VijayVignesh1 marked this pull request as ready for review November 3, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for multi sample item in optimize and yielding from the _getitem_ of the StreamingDataset

2 participants