-
Notifications
You must be signed in to change notification settings - Fork 45
Description
Describe the bug
If you try to use any of the database dump endpoints such as SQL, CSV or JSON the data is loaded into memory and then created as a dump file. To support any size database we should investigate enhancements to allow any sized database to be exported. Currently the size limitations are 1GB for Durable Objects with 10GB in the future. Operate under the assumption that we might be attempting to dump a 10GB database into a .sql
file.
Another consideration to make is because Durable Objects execute synchronous operations we may need to allow for "breathing intervals". An example might be we allow our export operation to run for 5 seconds, and take 5 seconds off if other requests are in a queue, then it can pick up again. The goal here would be to prevent locking the database for long periods of time.
But then poses the questions:
- How do we continue operations that need more than 30 seconds to work?
- Where is the data stored as it's being created? (R2, S3, something else)?
- How do we deliver that dump information to the user after its completed?
To Reproduce
Steps to reproduce the behavior:
- Hit the
/export/dump
endpoint on a large database - Will eventually fail when the 30 second request response time window closes
Run the following command in Terminal (replace the URL with yours) and if your operation exceeds 30 seconds you should see a failed network response instead of a dump file.
curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump' \
--header 'Authorization: Bearer ABC123' \
--output database_dump.sql
If you can't create a large enough test database feel free to add code in to sleep
for 29 seconds before proceeding with the /export/dump
functional code and should also see the failure.
Expected behavior
As a user I would expect any and all of the specified data to be dumped out without an error and without partial results. Where it ends up for the user to access if the operation takes more than 30 seconds is up for discussion. Ideally if shorter than 30 seconds it could be returned as our cURL above works today (downloads the file from the response of the origin request), but perhaps after the timeout it continues on uploads it to a destination source to access afterwards?
Proposed Solution:
- For backups require an R2 binding
- Have a
.sql
file that gets created in R2 with the filename likedump_20240101-170000.sql
where it represents2024-01-01 17:00:00
- Create the file and continuously append new chunks to it until reaching the end
- May need to utilize a DO alarm to continue the work after X time if a timeout occurs & mark where it currently is in the process in internal memory so it can pick up and continue.
- Provide a callback URL when the operation is finally completed so users can create custom logic to notify them (e.g. Email, Slack, etc)