-
Notifications
You must be signed in to change notification settings - Fork 374
Update storage limits documentation to reflect new file size recommendations #2037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -67,8 +67,8 @@ find . -name "*.jpg" | wc -l | |
| ```yaml | ||
| # Machine-readable Hub limits | ||
| hub_limits: | ||
| max_file_size_gb: 50 # absolute hard stop enforced by LFS | ||
| recommended_file_size_gb: 20 # best-practice shard size | ||
| max_file_size_gb: 200 # absolute hard stop enforced by LFS | ||
| recommended_file_size_gb: 50 # best-practice shard size | ||
| max_files_per_folder: 10000 # Git performance threshold | ||
| max_files_per_repo: 100000 # Repository file count limit | ||
| recommended_repo_size_gb: 300 # public-repo soft cap; contact HF if larger | ||
|
|
@@ -80,7 +80,7 @@ hub_limits: | |
| - Free: 100GB private datasets | ||
| - Pro (for individuals) | Team or Enterprise (for organizations): 1TB+ private storage per seat (see [pricing](https://huggingface.co/pricing)) | ||
| - Public: 1TB (contact [email protected] for larger) | ||
| - Per file: 50GB max, 20GB recommended | ||
| - Per file: 200GB max, <50GB recommended | ||
| - Per folder: <10k files | ||
|
|
||
| See https://huggingface.co/docs/hub/storage-limits#repository-limitations-and-recommendations for current limits for current recommendations for repository sizes and file counts. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,7 +43,7 @@ We gathered a list of tips and recommendations for structuring your repo. If you | |
| | Repo size | - | contact us for large repos (TBs of data) | | ||
| | Files per repo | <100k | merge data into fewer files | | ||
| | Entries per folder | <10k | use subdirectories in repo | | ||
| | File size | <20GB | split data into chunked files | | ||
| | File size | <50GB | split data into chunked files | | ||
| | Commit size | <100 files* | upload files in multiple commits | | ||
| | Commits per repo | - | upload multiple files per commit and/or squash history | | ||
|
|
||
|
|
@@ -67,7 +67,7 @@ which has very detailed documentation about the different factors that will impa | |
| For example, json files can be merged into a single jsonl file, or large datasets can be exported as Parquet files or in [WebDataset](https://github.com/webdataset/webdataset) format. | ||
| - The maximum number of files per folder cannot exceed 10k files per folder. A simple solution is to | ||
| create a repository structure that uses subdirectories. For example, a repo with 1k folders from `000/` to `999/`, each containing at most 1000 files, is already enough. | ||
| - **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 20GB each**. | ||
| - **File size**: In the case of uploading large files (e.g. model weights), we strongly recommend splitting them **into chunks of around 50GB each**. | ||
| There are a few reasons for this: | ||
| - Uploading and downloading smaller files is much easier both for you and the other users. Connection issues can always | ||
| happen when streaming data and smaller files avoid resuming from the beginning in case of errors. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but also update this part no?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (i think we said new file limit= 200GB)
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe:
works? |
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it LFS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I've just checked and we still reference
LFSin quite a few places, so I'll make a separate PR to tidy those.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the product/platform wordings we moved to "Large Files". cc @Pierrci our terminology expert