Release v0.2.46
What's Changed
- Feat: Add
per_streambatching method to CombinedStreamingDataset by @schopra8 in #438 - Fix parquet cache by @philgzl in #560
- refactor: StreamingDataset variable names for better readability by @deependujha in #557
- feat: Add GitHub Actions workflow for
@benchmarkbot by @deependujha in #561 - fix:
@benchmarkbot fixes by @deependujha in #565 - Fix
IndexErrorwhen resuming after some workers are done by @philgzl in #567 - ref: simplify cache dir creation and remove repeated parts by @bhimrazy in #568
- fix: suppress FileNotFoundError when acquiring file lock for count file by @bhimrazy in #570
- fix: Consolidate Cache Handling + Fix DDP Multi-Indexing for huggingface datasets by @bhimrazy in #569
- update readme to include best practices for image data optimization by @bhimrazy in #577
New Contributors
Full Changelog: v0.2.45...v0.2.46