Database dumps do not work on large databases

**Describe the bug**
If you try to use any of the database dump endpoints such as SQL, CSV or JSON the data is loaded into memory and then created as a dump file. To support any size database we should investigate enhancements to allow any sized database to be exported. Currently the size limitations are 1GB for Durable Objects with 10GB in the future. Operate under the assumption that we might be attempting to dump a 10GB database into a `.sql` file.

Another consideration to make is because Durable Objects execute synchronous operations we may need to allow for "breathing intervals". An example might be we allow our export operation to run for 5 seconds, and take 5 seconds off if other requests are in a queue, then it can pick up again. The goal here would be to prevent locking the database for long periods of time.

But then poses the questions: 
1. How do we continue operations that need more than 30 seconds to work?
2. Where is the data stored as it's being created? (R2, S3, something else)?
3. How do we deliver that dump information to the user after its completed?

**To Reproduce**
Steps to reproduce the behavior:

1. Hit the `/export/dump` endpoint on a large database
2. Will eventually fail when the 30 second request response time window closes

Run the following command in Terminal (replace the URL with yours) and if your operation exceeds 30 seconds you should see a failed network response instead of a dump file.
```
curl --location 'https://starbasedb.YOUR-ID-HERE.workers.dev/export/dump' \
--header 'Authorization: Bearer ABC123' \
--output database_dump.sql
```

If you can't create a large enough test database feel free to add code in to `sleep` for 29 seconds before proceeding with the `/export/dump` functional code and should also see the failure.

**Expected behavior**
As a user I would expect any and all of the specified data to be dumped out without an error and without partial results. Where it ends up for the user to access if the operation takes more than 30 seconds is up for discussion. Ideally if shorter than 30 seconds it could be returned as our cURL above works today (downloads the file from the response of the origin request), but perhaps after the timeout it continues on uploads it to a destination source to access afterwards?

**Proposed Solution:**
1. For backups require an R2 binding
2. Have a `.sql` file that gets created in R2 with the filename like `dump_20240101-170000.sql` where it represents `2024-01-01 17:00:00`
3. Create the file and continuously append new chunks to it until reaching the end
4. May need to utilize a DO alarm to continue the work after X time if a timeout occurs & mark where it currently is in the process in internal memory so it can pick up and continue.
5. Provide a callback URL when the operation is finally completed so users can create custom logic to notify them (e.g. Email, Slack, etc)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Database dumps do not work on large databases #59

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Database dumps do not work on large databases #59

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions