12 Jun 11:45

tchaton

5a8c2c9

Release v0.2.9

What's Changed

compatibility with downloading data from gcp by @dangthatsright in #154
Flexibility to set device number < total device count by @yhl48 in #155
Remove DataLoader example in README by @tchaton in #162
Add support for custom collate with the StreamingDataLoader by @tchaton in #163
(fix) CombinedDataset with more than 2 streaming datasets by @tchaton in #164
Bump version 0.2.9 by @tchaton in #165

New Contributors

@dangthatsright made their first contribution in #154

Full Changelog: v0.2.8...v0.2.9

Contributors

dangthatsright, tchaton, and yhl48

Assets 2

03 Jun 08:33

tchaton

v0.2.8

6519a98

Release v0.2.8

What's Changed

Update README.md by @tchaton in #143
Performance improvement for processing by @sritterginkgo in #146
Fix: Resolve drop_last not passed down from the StreamingDataLoader to the datasets by @tchaton in #147
Bump pytest from 8.2.0 to 8.2.1 by @dependabot in #148
LitData release version bump 0.2.8 by @tchaton in #153

New Contributors

@sritterginkgo made their first contribution in #146

Full Changelog: v0.2.7...v0.2.8

Contributors

tchaton, dependabot, and sritterginkgo

Assets 2

24 May 18:52

tchaton

v0.2.7

f23ba62

Release v0.2.7

What's Changed

Fix the NoHeaderTensorSerializer for 1D tensors (other than tokens) by @enrico-stauss in #124
Fix infinite sleep when loading local compressed dataset. by @wzf03 in #127
Fix configuration of a custom serializers for one of the predefined types by @enrico-stauss in #125
Add dist env detection via env vars by @gkroiz in #95
Fix empty tensor deserialization by @enrico-stauss in #131
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.1 by @dependabot in #132
Prevent race deletion by @tchaton in #136
Add support for exact iteration by @tchaton in #139
Bump LitData version 0.2.7 by @tchaton in #142

New Contributors

@enrico-stauss made their first contribution in #124
@wzf03 made their first contribution in #127
@gkroiz made their first contribution in #95

Full Changelog: v0.2.6...v0.2.7

Contributors

tchaton, dependabot, and 3 other contributors

Assets 2

07 May 15:02

tchaton

v0.2.6

efa4ae0

Release 0.2.6

What's Changed

Bump pytest from 8.0.2 to 8.2.0 by @dependabot in #115
Bump coverage from 7.4.4 to 7.5.0 by @dependabot in #117
Bump pytest-cov from 4.1.0 to 5.0.0 by @dependabot in #116
Resolve some bugs by @tchaton in #121
Add support for iterate_over_all for the CombinedDataset by @tchaton in #122
Update version 0.2.6 by @tchaton in #123

Full Changelog: v0.2.5...v0.2.6

Contributors

tchaton and dependabot

Assets 2

24 Apr 16:30

tchaton

v0.2.5

26119e3

Release 0.2.5

What's Changed

Remove condition on torch installation by @tchaton in #110
Bump version 0.2.5 by @tchaton in #111

Full Changelog: v0.2.4...v0.2.5

Contributors

tchaton

Assets 2

24 Apr 16:13

tchaton

v0.2.4

d0d19e6

Release 0.2.4

What's Changed

Update LitGPT references in README.md by @rasbt in #90
Don't raise a runtimeError if the downloader doesn't exist. by @tchaton in #98
Added call to setup function of serializer class to set data format by @vgurev in #96
Fix map() failing to create dataset when input_dir is None by @awaelchli in #100
Streamingdataset torch compatibility by @yhl48 in #108
Move to version 0.2.4 by @tchaton in #109

New Contributors

@rasbt made their first contribution in #90
@vgurev made their first contribution in #96
@awaelchli made their first contribution in #100
@yhl48 made their first contribution in #108

Full Changelog: v0.2.3...v0.2.4

Contributors

awaelchli, rasbt, and 3 other contributors

Assets 2

03 Apr 09:18

tchaton

v0.2.3

ee69581

Release 0.2.3

Full Changelog: v0.2.2...v0.2.3

Assets 2

08 Mar 15:17

tchaton

v0.2.2

c3f2278

Release 0.2.2

Couple of tiny fixes.

Assets 2

05 Mar 09:55

tchaton

v0.2.1

e89b5a2

Release 0.2.1

Release 0.2.1. Minor fixes.

Assets 2

26 Feb 13:33

tchaton

v0.2.0

a05495e

Release 0.2.0

⚡ Welcome to Lightning Data

We developed StreamingDataset to optimize training of large datasets stored on the cloud while prioritizing speed, affordability, and scalability.

Specifically crafted for multi-gpu & multi-node (with DDP, FSDP, etc...), distributed training with large models, it enhances accuracy, performance, and user-friendliness. Now, training efficiently is possible regardless of the data's location. Simply stream in the required data when needed.

The StreamingDataset is compatible with any data type, including images, text, video, audio, geo-spatial, and multimodal data and it is a drop-in replacement for your PyTorch IterableDataset class. For example, it is used by Lit-GPT to pretrain LLMs.

This release marks the first of the release from litdata. From now on, we will track all changes within a CHANGELOG.md file.

Thanks to all contributors.

Assets 2

Releases: Lightning-AI/litData

Release v0.2.9

What's Changed

New Contributors

Contributors

Uh oh!

Release v0.2.8

What's Changed

New Contributors

Contributors

Uh oh!

Release v0.2.7

What's Changed

New Contributors

Contributors

Uh oh!

Release 0.2.6

What's Changed

Contributors

Uh oh!

Release 0.2.5

What's Changed

Contributors

Uh oh!

Release 0.2.4

What's Changed

New Contributors

Contributors

Uh oh!

Release 0.2.3

Uh oh!

Release 0.2.2

Uh oh!

Release 0.2.1

Uh oh!

Release 0.2.0

⚡ Welcome to Lightning Data

Uh oh!