Skip to content

LitData v0.2.55

Choose a tag to compare

@pwgardipee pwgardipee released this 19 Sep 15:47
· 32 commits to main since this release
f990376

Lightning AI ⚡ is excited to announce the release of LitData v0.2.55

Highlights

[Fixed] Writing compressed data to a lighting_storage folder

This release focuses on fixing errors when writing compressed output data to a lightning_storage folder. Previously, a code snippet like the following would break.

from litdata import StreamingDataset, StreamingDataLoader, optimize
import time

def should_keep(data):
    if data % 2 == 0:
        yield data


if __name__ == "__main__":
    output_dir = "/teamspace/lightning_storage/my-folder-1/output"
    optimize(
        fn=should_keep,
        inputs=list(range(500)),
        output_dir=output_dir,
        chunk_bytes="64MB",
        num_workers=4,
        compression="zstd", # Previously, this would cause an error
    )
    time.sleep(20) 
    dataset = StreamingDataset(output_dir)
    dataloader = StreamingDataLoader(dataset, batch_size=32, num_workers=4)
    for _ in dataloader:
        # process code here
        pass

Changes

Fixed
  • Fix errors when using compression and r2 in optimize() by @pwgardipee in #715
Changed
Chores
  • chore(ci): Add step to minimize uv cache in CI workflow by @bhimrazy in #713

Full Changelog: v0.2.54...v0.2.55

🧑‍💻 Contributors

We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!

Key Contributors

@pwgardipee @bhimrazy

Thank you ❤️ and we hope you'll keep them coming!