Skip to content

Releases: Lightning-AI/litData

Release v0.2.9

12 Jun 11:45
5a8c2c9

Choose a tag to compare

What's Changed

  • compatibility with downloading data from gcp by @dangthatsright in #154
  • Flexibility to set device number < total device count by @yhl48 in #155
  • Remove DataLoader example in README by @tchaton in #162
  • Add support for custom collate with the StreamingDataLoader by @tchaton in #163
  • (fix) CombinedDataset with more than 2 streaming datasets by @tchaton in #164
  • Bump version 0.2.9 by @tchaton in #165

New Contributors

Full Changelog: v0.2.8...v0.2.9

Release v0.2.8

03 Jun 08:33
6519a98

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.7...v0.2.8

Release v0.2.7

24 May 18:52
f23ba62

Choose a tag to compare

What's Changed

  • Fix the NoHeaderTensorSerializer for 1D tensors (other than tokens) by @enrico-stauss in #124
  • Fix infinite sleep when loading local compressed dataset. by @wzf03 in #127
  • Fix configuration of a custom serializers for one of the predefined types by @enrico-stauss in #125
  • Add dist env detection via env vars by @gkroiz in #95
  • Fix empty tensor deserialization by @enrico-stauss in #131
  • Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.1 by @dependabot in #132
  • Prevent race deletion by @tchaton in #136
  • Add support for exact iteration by @tchaton in #139
  • Bump LitData version 0.2.7 by @tchaton in #142

New Contributors

Full Changelog: v0.2.6...v0.2.7

Release 0.2.6

07 May 15:02
efa4ae0

Choose a tag to compare

What's Changed

Full Changelog: v0.2.5...v0.2.6

Release 0.2.5

24 Apr 16:30
26119e3

Choose a tag to compare

What's Changed

Full Changelog: v0.2.4...v0.2.5

Release 0.2.4

24 Apr 16:13
d0d19e6

Choose a tag to compare

What's Changed

  • Update LitGPT references in README.md by @rasbt in #90
  • Don't raise a runtimeError if the downloader doesn't exist. by @tchaton in #98
  • Added call to setup function of serializer class to set data format by @vgurev in #96
  • Fix map() failing to create dataset when input_dir is None by @awaelchli in #100
  • Streamingdataset torch compatibility by @yhl48 in #108
  • Move to version 0.2.4 by @tchaton in #109

New Contributors

Full Changelog: v0.2.3...v0.2.4

Release 0.2.3

03 Apr 09:18
ee69581

Choose a tag to compare

Full Changelog: v0.2.2...v0.2.3

Release 0.2.2

08 Mar 15:17
c3f2278

Choose a tag to compare

Couple of tiny fixes.

Release 0.2.1

05 Mar 09:55
e89b5a2

Choose a tag to compare

Release 0.2.1. Minor fixes.

Release 0.2.0

26 Feb 13:33
a05495e

Choose a tag to compare

⚡ Welcome to Lightning Data

We developed StreamingDataset to optimize training of large datasets stored on the cloud while prioritizing speed, affordability, and scalability.

Specifically crafted for multi-gpu & multi-node (with DDP, FSDP, etc...), distributed training with large models, it enhances accuracy, performance, and user-friendliness. Now, training efficiently is possible regardless of the data's location. Simply stream in the required data when needed.

The StreamingDataset is compatible with any data type, including images, text, video, audio, geo-spatial, and multimodal data and it is a drop-in replacement for your PyTorch IterableDataset class. For example, it is used by Lit-GPT to pretrain LLMs.

This release marks the first of the release from litdata. From now on, we will track all changes within a CHANGELOG.md file.

Thanks to all contributors.