-
Notifications
You must be signed in to change notification settings - Fork 228
batch processing with cloud storage #1726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
batch processing with cloud storage #1726
Conversation
…ting the reference_file for the user.
| - `az://container/images/*.png` - PNG files in images folder | ||
|
|
||
| !!! hint "Cloud Storage Examples" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should highlight here that the credtials are used only locally byt the CLI tool to generate signed URLs
|
When it comes to implementation and functionality - looks ok
|
|
thanks @PawelPeczek-Roboflow
I agree. We could divide and conquer the namespace and probe for the existence of keys to optimally fan-out the listing operation. I think that is something we could add later if this indeed becomes is the limiting factor or integrate it
The primary goal of this PR was ease-of-use and better UX with immediate feedback to the user that something is happening. IMHO for payloads of 100k+ the user anyway should be hooking up their bucket directly to our services and let us manage this. |
PawelPeczek-Roboflow
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, fine for me
Description
https://www.loom.com/share/ce23eb8e863d43a78036b8f7608aab0b
closes jeremyprescott/ss-160-batch-processing-cloud-storage-improvements
This pull request simplifies the creation of
referencefileused by batch processing; it allows the user to use its system credentials directly without needing to create a custom script or figure out the format. This should cover the 90% of the use cases that users have.Screen.Recording.2025-11-17.at.13.59.35.mov
List any dependencies that are required for this change.
fsspecwhich abstracts S3, GCS and Azure access. Its optionally installedexample usage
GCS
inference rf-cloud data-staging create-batch-of-images \ --batch-id=test-gcs-lenny-$(date +%Y%m%d-%H%M%S) \ --data-source=cloud-storage \ --bucket-path=gs://roboflow-jeremy-test/GCS credentials overwrite
GOOGLE_APPLICATION_CREDENTIALSS3 staging
inference rf-cloud data-staging create-batch-of-images \ --batch-id=test-s3-lenny-$(date +%Y%m%d-%H%M%S) \ --data-source=cloud-storage \ --bucket-path=s3://roboflow-jeremy-test/lennyS3 credentials overwrite
AWS_PROFILEAWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYS3 Env
AWS_ENDPOINT_URLAWS_REGIONAZURE staging
azure does not support using system credentials. They need to be specified.
AZURE_STORAGE_ACCOUNT_NAME=roboflowjeremy AZURE_STORAGE_ACCOUNT_KEY={KEY} inference rf-cloud data-staging create-batch-of-images \ --batch-id=test-azure-lenny-$(date +%Y%m%d-%H%M%S) \ --data-source=cloud-storage \ --bucket-path=az://rf-test/AZURE credentials overwrite
AZURE_STORAGE_SAS_TOKENType of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
YOUR_ANSWER
Any specific deployment considerations
For example, documentation changes, usability, usage/costs, secrets, etc.
Docs