Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/hub/datasets-upload-guide-llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,8 @@ find . -name "*.jpg" | wc -l
```yaml
# Machine-readable Hub limits
hub_limits:
max_file_size_gb: 50 # absolute hard stop enforced by LFS
recommended_file_size_gb: 20 # best-practice shard size
max_file_size_gb: 200 # absolute hard stop enforced by LFS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it LFS?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I've just checked and we still reference LFS in quite a few places, so I'll make a separate PR to tidy those.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the product/platform wordings we moved to "Large Files". cc @Pierrci our terminology expert

recommended_file_size_gb: 50 # best-practice shard size
max_files_per_folder: 10000 # Git performance threshold
max_files_per_repo: 100000 # Repository file count limit
recommended_repo_size_gb: 300 # public-repo soft cap; contact HF if larger
Expand All @@ -80,7 +80,7 @@ hub_limits:
- Free: 100GB private datasets
- Pro (for individuals) | Team or Enterprise (for organizations): 1TB+ private storage per seat (see [pricing](https://huggingface.co/pricing))
- Public: 1TB (contact [email protected] for larger)
- Per file: 50GB max, 20GB recommended
- Per file: 200GB max, <50GB recommended
- Per folder: <10k files

See https://huggingface.co/docs/hub/storage-limits#repository-limitations-and-recommendations for current limits for current recommendations for repository sizes and file counts.
Expand Down
4 changes: 2 additions & 2 deletions docs/hub/storage-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ We gathered a list of tips and recommendations for structuring your repo. If you
| Repo size | - | contact us for large repos (TBs of data) |
| Files per repo | <100k | merge data into fewer files |
| Entries per folder | <10k | use subdirectories in repo |
| File size | <20GB | split data into chunked files |
| File size | <50GB | split data into chunked files |
| Commit size | <100 files* | upload files in multiple commits |
| Commits per repo | - | upload multiple files per commit and/or squash history |

Expand All @@ -67,7 +67,7 @@ which has very detailed documentation about the different factors that will impa
For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format.
- The maximum number of files per folder cannot exceed 10k files per folder. A simple solution is to
create a repository structure that uses subdirectories. For example, a repo with 1k folders from `000/` to `999/`, each containing at most 1000 files, is already enough.
- **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 20GB each**.
- **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 50GB each**.
There are a few reasons for this:
- Uploading and downloading smaller files is much easier both for you and the other users. Connection issues can always
happen when streaming data and smaller files avoid resuming from the beginning in case of errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also update this part no?

In all cases no single LFS file will be able to be >50GB. I.e. 50GB is the hard limit for single file size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i think we said new file limit= 200GB)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

In the case of uploading large files (e.g. model weights), we strongly recommend splitting them into chunks <200GB each.

works?

Expand Down