Skip to content

Release v0.2.46

Choose a tag to compare

@bhimrazy bhimrazy released this 03 May 07:58
· 127 commits to main since this release
96238b6

What's Changed

  • Feat: Add per_stream batching method to CombinedStreamingDataset by @schopra8 in #438
  • Fix parquet cache by @philgzl in #560
  • refactor: StreamingDataset variable names for better readability by @deependujha in #557
  • feat: Add GitHub Actions workflow for @benchmark bot by @deependujha in #561
  • fix: @benchmark bot fixes by @deependujha in #565
  • Fix IndexError when resuming after some workers are done by @philgzl in #567
  • ref: simplify cache dir creation and remove repeated parts by @bhimrazy in #568
  • fix: suppress FileNotFoundError when acquiring file lock for count file by @bhimrazy in #570
  • fix: Consolidate Cache Handling + Fix DDP Multi-Indexing for huggingface datasets by @bhimrazy in #569
  • update readme to include best practices for image data optimization by @bhimrazy in #577

New Contributors

Full Changelog: v0.2.45...v0.2.46