LitData v0.2.55
Lightning AI ⚡ is excited to announce the release of LitData v0.2.55
Highlights
[Fixed] Writing compressed data to a lighting_storage folder
This release focuses on fixing errors when writing compressed output data to a lightning_storage folder. Previously, a code snippet like the following would break.
from litdata import StreamingDataset, StreamingDataLoader, optimize
import time
def should_keep(data):
if data % 2 == 0:
yield data
if __name__ == "__main__":
output_dir = "/teamspace/lightning_storage/my-folder-1/output"
optimize(
fn=should_keep,
inputs=list(range(500)),
output_dir=output_dir,
chunk_bytes="64MB",
num_workers=4,
compression="zstd", # Previously, this would cause an error
)
time.sleep(20)
dataset = StreamingDataset(output_dir)
dataloader = StreamingDataLoader(dataset, batch_size=32, num_workers=4)
for _ in dataloader:
# process code here
passChanges
Fixed
- Fix errors when using compression and r2 in optimize() by @pwgardipee in #715
Changed
- Remove s5cmd from the R2 downloader by @pwgardipee in #714
Full Changelog: v0.2.54...v0.2.55
🧑💻 Contributors
We thank all folks who submitted issues, features, fixes and doc changes. It's the only way we can collectively make LitData better for everyone, nice job!
Key Contributors
Thank you ❤️ and we hope you'll keep them coming!