Skip to content

Commit c0c7386

Browse files
pwgardipeePeyton Gardipeetchaton
authored
Add readme docs for references to data connection dirs (#708)
* Add readme docs for references to data connection dirs * Change title Co-authored-by: thomas chaton <[email protected]> * Add download example --------- Co-authored-by: Peyton Gardipee <[email protected]> Co-authored-by: thomas chaton <[email protected]>
1 parent e3a62c9 commit c0c7386

File tree

1 file changed

+49
-0
lines changed

1 file changed

+49
-0
lines changed

README.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1619,6 +1619,55 @@ if __name__ == "__main__":
16191619

16201620
</details>
16211621

1622+
<details>
1623+
<summary> ✅ Lightning AI Data Connections - Direct download and upload </summary>
1624+
1625+
&nbsp;
1626+
1627+
[Lightning Studios](https://lightning.ai/) have special directories for data connections that are available to an entire teamspace. LitData functions that reference those directories will experience a significant performance increase as uploads and downloads will happen directly from the bucket that backs the folder.
1628+
1629+
For example, output artifacts from this code will be directly uploaded to the `my-data-1` s3 bucket.
1630+
1631+
```
1632+
from litdata import optimize
1633+
1634+
def should_keep(data):
1635+
if data % 2 == 0:
1636+
yield data
1637+
1638+
if __name__ == "__main__":
1639+
optimize(
1640+
fn=should_keep,
1641+
inputs=list(range(1000)),
1642+
output_dir="/teamspace/s3_connections/my-data-1/output",
1643+
chunk_bytes="64MB",
1644+
num_workers=1
1645+
)
1646+
```
1647+
1648+
1649+
Similarly, data will be downloaded directly from the `my-data-1` s3 bucket in this example code.
1650+
1651+
```
1652+
from litdata import StreamingRawDataset
1653+
1654+
if __name__ == "__main__":
1655+
data_dir = "/teamspace/s3_connections/my-bucket-1/data"
1656+
1657+
raw_dataset = StreamingRawDataset(data_dir)
1658+
1659+
data = list(raw_dataset)
1660+
print(data)
1661+
```
1662+
1663+
References to any of the following directories will work similarly:
1664+
1. `/teamspace/lightning_storage/...`
1665+
2. `/teamspace/s3_connections/...`
1666+
3. `/teamspace/gcs_connections/...`
1667+
4. `/teamspace/s3_folders/...`
1668+
5. `/teamspace/gcs_folders/...`
1669+
</details>
1670+
16221671
&nbsp;
16231672
16241673

0 commit comments

Comments
 (0)