Release LitData v0.2.55 · Lightning-AI/litData

Lightning AI ⚡ is excited to announce the release of LitData v0.2.55

Highlights

[Fixed] Writing compressed data to a lighting_storage folder

This release focuses on fixing errors when writing compressed output data to a lightning_storage folder. Previously, a code snippet like the following would break.

from litdata import StreamingDataset, StreamingDataLoader, optimize
import time

def should_keep(data):
    if data % 2 == 0:
        yield data


if __name__ == "__main__":
    output_dir = "/teamspace/lightning_storage/my-folder-1/output"
    optimize(
        fn=should_keep,
        inputs=list(range(500)),
        output_dir=output_dir,
        chunk_bytes="64MB",
        num_workers=4,
        compression="zstd", # Previously, this would cause an error
    )
    time.sleep(20) 
    dataset = StreamingDataset(output_dir)
    dataloader = StreamingDataLoader(dataset, batch_size=32, num_workers=4)
    for _ in dataloader:
        # process code here
        pass

Changes

Fixed

Fix errors when using compression and r2 in optimize() by @pwgardipee in #715

Changed

Remove s5cmd from the R2 downloader by @pwgardipee in #714

Chores

chore(ci): Add step to minimize uv cache in CI workflow by @bhimrazy in #713

Full Changelog: v0.2.54...v0.2.55

🧑‍💻 Contributors

We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!

Key Contributors

@pwgardipee @bhimrazy

Thank you ❤️ and we hope you'll keep them coming!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LitData v0.2.55

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

[Fixed] Writing compressed data to a lighting_storage folder

Changes

🧑‍💻 Contributors

Key Contributors

Contributors

Uh oh!