Skip to content

Commit ffcf126

Browse files
committed
Initial commit
0 parents  commit ffcf126

File tree

12 files changed

+827
-0
lines changed

12 files changed

+827
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
/.idea
2+
*.iml

Dockerfile

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
FROM alpine:3.22 AS base
2+
ARG TARGETARCH
3+
# Install rclone and create a temporary dir for the backup files
4+
ARG VERSION_RCLONE=current
5+
RUN wget https://downloads.rclone.org/${VERSION_RCLONE#current}/rclone-${VERSION_RCLONE}-linux-${TARGETARCH/aarch/arm}.zip -O rclone.zip && \
6+
unzip -j rclone.zip 'rclone*/rclone' -d /usr/local/bin && \
7+
rm rclone.zip && \
8+
addgroup -g 1001 dbbackup && \
9+
adduser -u 1001 -G dbbackup -D dbbackup && \
10+
install -d -o 1001 -g 1001 -m 1777 /scratch
11+
12+
RUN --mount=type=cache,target=/etc/apk/cache apk add --update-cache rage envsubst
13+
14+
# Tell rclone not to attempt to read/write a config file by default - all
15+
# configuration will be coming from environment variables
16+
ENV RCLONE_CONFIG=/dev/null
17+
# Assume RCLONE_CONFIG_STORE_* for configuration by default
18+
ENV REMOTE_NAME=store
19+
WORKDIR /scratch
20+
21+
COPY common.sh /common.sh
22+
23+
FROM base AS postgresql
24+
25+
RUN --mount=type=cache,target=/etc/apk/cache apk add postgresql17-client
26+
27+
COPY postgresql/backup.sh /backup.sh
28+
29+
USER 1001:1001
30+
31+
ENTRYPOINT ["/backup.sh"]
32+
33+
FROM base AS mariadb
34+
35+
RUN --mount=type=cache,target=/etc/apk/cache apk add mariadb-client
36+
37+
COPY mariadb/backup.sh /backup.sh
38+
39+
USER 1001:1001
40+
41+
ENTRYPOINT ["/backup.sh"]

README.md

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# Database backups with rclone
2+
3+
This is a generalized version of the popular [schickling/dockerfiles](https://github.com/schickling/dockerfiles) container images `postgres-backup-s3` and `mysql-backup-s3`, which take SQL backups of the relevant database type, optionally encrypt the backup file, and upload it to Amazon S3 (or an API-compatible data store). Those images work well but have a number of limitations:
4+
5+
- the backing store _must_ be S3 or some compatible datastore supported by the `aws` CLI
6+
- authentication must use static credentials supplied in non-standard environment variables `S3_ACCESS_KEY_ID`, etc. (at least for the postgresql backup image)
7+
- the encryption algorithm used by the postgresql image is `aes-256-cbc`, which lacks in-built authentication and has other known shortcomings
8+
9+
This tool uses [rage](https://github.com/str4d/rage) for encryption and [rclone](https://rclone.org) to upload the backup files rather than the `aws` CLI, meaning you can use [any other supported `rclone` backend](https://rclone.org/overview/) as the data store, including but not limited to:
10+
11+
- S3 or compatible data stores, using any authentication method supported by the AWS Go SDK (static credentials, IAM roles for containers, `AssumeRoleWithWebIdentity`, etc.)
12+
- Other cloud storage services such as Google Drive or Google Cloud, Azure Blob or File Storage, Dropbox, and many others
13+
- Private cloud object stores such as OpenStack Swift
14+
- An SFTP, SMB or WebDav file server
15+
16+
## Usage
17+
18+
The `ghcr.io/gatenlp/postgresql-backup-rclone` and `ghcr.io/gatenlp/mariadb-backup-rclone` images are designed to defer where possible to the underlying tools' native configuration mechanisms rather than introducing our own configuration mechanism. This makes them slightly more complicated to set up when compared to the original `schickling/dockerfiles` images, but opens up the full flexibility of the underlying tools. The [examples](examples) folder has some sample manifests that show how you might deploy a Kubernetes `CronJob` that does daily backups to a variety of data stores.
19+
20+
There are a small number of environment variables that are interpreted directly by the script:
21+
22+
- `BACKUP_DATABASES`: the names of the databases from the target server that you want to back up, separated by commas. Each named database will be backed up to a different file.
23+
- alternatively, you can set `BACKUP_ALL=true` to dump _all_ databases into a single file (the `--all-databases` option to `mysqldump`, or the `pg_dumpall` tool for PostgreSQL)
24+
- `BACKUP_FILE_NAME`: a file name or file name pattern to which the backup will be written. The pattern may include `strftime` date formatting directives to include the date and time of the backup as part of the file name, and may include subdirectories. For example `%Y/%m/backup-%Y-%m-%dT%H-%M-%S` would include the full date and time in the file name, and place it in a folder named for the year and month, e.g. `2025/08/backup-2025-08-12T13-45-15`. The pattern should include only ASCII letters, numbers, `_`, `-`, `/` and `.` characters, anything else will be changed to a hyphen, and a `.sql.gz` suffix will be added if it is not already present.
25+
- if not using `BACKUP_ALL` mode, the `BACKUP_FILE_NAME` should include a placeholder `$DB` or `${DB}` which will be replaced by the name of the database. This is _required_ if more than one database is named by `BACKUP_DATABASES`
26+
- `REMOTE_NAME`: name of the `rclone` "remote" that defines the target datastore - this can be either the _name_ of a remote that is configured with standard rclone environment variables or configuration file, or it can be a [_connection string_](https://rclone.org/docs/#connection-strings) starting with `:` that provides the remote configuration inline, e.g. `:s3,env_auth`. The default value if not specified is `store`, which would then typically be configured with environment variables of the form `RCLONE_CONFIG_STORE_{option}`.
27+
- `UPLOAD_PREFIX`: optional prefix to prepend to the generated file name to give the final location within the rclone remote. For example, if the remote is S3 this could be the name of the bucket.
28+
29+
## Database connection parameters
30+
31+
The parameters for connection to the database are provided using the native methods of each database client. Typically this is either a set of environment variables, command-line options, or a bind-mounted configuration file, or a combination of all these.
32+
33+
### PostgreSQL
34+
35+
The [`pg_dump`](https://www.postgresql.org/docs/current/app-pgdump.html) tool can take configuration from environment variables, files, and/or command-line parameters. In most cases you will probably use the following environment variables:
36+
37+
- `PGHOST`: hostname of the database server
38+
- `PGPORT`: port number, if not the default 5432
39+
- `PGUSER`: username to authenticate
40+
- `PGPASSWORD`: password for that username
41+
- if you specify `PGPASSWORD_FILE` then the script will read the contents of that file into the `PGPASSWORD` variable
42+
- alternatively you can provide your own `.pgpass` formatted file with credentials, and reference that with the `PGPASSFILE` environment variable
43+
44+
Any additional command-line options passed to the container will be forwarded unchanged to `pg_dump` or `pg_dumpall` as appropriate.
45+
46+
### MariaDB / MySQL
47+
48+
The [`mariadb-dump`](https://mariadb.com/docs/server/clients-and-utilities/backup-restore-and-import-clients/mariadb-dump) tool can take configuration from [environment variables](https://mariadb.com/docs/server/server-management/install-and-upgrade-mariadb/configuring-mariadb/mariadb-environment-variables), option files, and/or command-line parameters. In most cases you will probably use the following environment variables or parameters:
49+
50+
- `MYSQL_HOST` or `--host=...`: hostname of the database server
51+
- `MYSQL_TCP_PORT` or `--port=...`: port number, if not the default 3306
52+
- `MYSQL_PWD` or `--password=...`: password for authentication
53+
- if you specify the environment variable `MYSQL_PWD_FILE` then the script will read the contents of that file into the `MYSQL_PWD` variable
54+
- `--user=...`: username for authentication - note that `mariadb-dump` does not provide an environment variable alternative for this option, it can only be supplied on the command line or in an option file.
55+
56+
Any additional command line options passed to the container will be forwarded unchanged to the `mysqldump` commands`.
57+
58+
Alternatively you can provide the connection and authentication details in a `my.cnf`-style "options file" bind-mounted into the container at `/etc/mysql`, or at some other location if you specify an argument of `--defaults-extra-file=/path/to/my.cnf`.
59+
60+
## Rclone data store configuration
61+
62+
There are three basic ways to configure `rclone` to talk to your data store:
63+
64+
1. use environment variables `RCLONE_CONFIG_STORE_*`
65+
2. bind-mount an `rclone.conf` file into your container, and set `RCLONE_CONFIG=/path/to/rclone.conf`
66+
3. set `REMOTE_NAME` to a full connection string starting with `:`
67+
68+
In most cases option 1 will be the simplest. The following sections provide examples for common datastore types.
69+
70+
> **Note**: There is no way to pass command line parameters through to `rclone`, but _every_ parameter to `rclone` has an environment variable equivalent - take the long option form, replace the leading `--` with `RCLONE_`, change the remaining hyphens to underscores and convert to upper-case. E.g. `--max-connections 3` on the command line becomes `RCLONE_MAX_CONNECTIONS=3` in the environment.
71+
72+
### Amazon S3
73+
74+
- `RCLONE_CONFIG_STORE_TYPE=s3`
75+
- `RCLONE_CONFIG_STORE_PROVIDER=AWS`
76+
- `RCLONE_CONFIG_STORE_ENV_AUTH=true`
77+
- If your bucket uses `SSE_KMS` server side encryption then you should also set `RCLONE_IGNORE_CHECKSUM=true`, since SSE breaks the checksum tests that rclone normally attempts to perform
78+
- By default, `rclone` will check whether the bucket exists before uploading to it, and make a `HEAD` request after uploading each file to check that the upload was successful. These checks require _read_ access to the bucket, so if your credentials have "write-only" permission (i.e. the IAM policy permits `s3:PutObject` but not `s3:GetObject`), then you will need to disable these checks by setting:
79+
- `RCLONE_S3_NO_CHECK_BUCKET=true`
80+
- `RCLONE_S3_NO_HEAD=true`
81+
82+
You then need to provide the region name and credentials, in some form that the AWS SDK understands. The region is set in the variable `AWS_REGION`, e.g. `AWS_REGION=us-east-1`. For credentials, the most common option is `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` for static credentials, but other supported authentication schemes include
83+
84+
- `AWS_WEB_IDENTITY_TOKEN_FILE`, `AWS_ROLE_ARN` and `AWS_ROLE_SESSION_NAME` to assume an IAM role from a JWT token
85+
- `AWS_CONTAINER_CREDENTIALS_FULL_URI` (and `AWS_CONTAINER_AUTHORIZATION_TOKEN_FILE`) to set an HTTP/HTTPS endpoint that serves temporary credentials, and an authorization token to use when calling it - this is set up for you automatically when using "pod identity" in EKS
86+
87+
`UPLOAD_PREFIX` would then be set to the `bucketname/prefix` where you want to store your backups.
88+
89+
To use server-side encryption with a customer-provided key:
90+
91+
- `RCLONE_S3_SSE_CUSTOMER_ALGORITHM=AES256`
92+
- `RCLONE_S3_SSE_CUSTOMER_KEY=<your key>` or `RCLONE_S3_SSE_CUSTOMER_KEY_BASE64=<your key in base64>`
93+
94+
### S3-compatible service
95+
96+
The same approach works for other services or self-hosted datastores that are compatible with the S3 API, you just need to set `RCLONE_CONFIG_STORE_PROVIDER=Minio` (or whatever provider you are using) and `AWS_ENDPOINT_URL_S3` to point to your provider's endpoint.
97+
98+
### Azure Blob Storage
99+
100+
- `RCLONE_CONFIG_STORE_TYPE=azureblob`
101+
102+
If you are authenticating using a container-level or account-level SAS token then the only other required environment variable would be
103+
104+
- `RCLONE_CONFIG_STORE_SAS_URL=https://accountname.blob.core.windows.net/container?<sastoken>`
105+
106+
For any other authentication style, you must specify the account name
107+
108+
- `RCLONE_CONFIG_STORE_ACCOUNT=accountname`
109+
110+
and then either `RCLONE_CONFIG_STORE_KEY={storage-account-key}` for shared key authentication, or `RCLONE_CONFIG_STORE_ENV_AUTH=true` for Entra ID authentication. The [`env_auth` option](https://rclone.org/azureblob/#env-auth) will handle authentication with a service principal, workload identity (if running in AKS), or managed service identity as appropriate.
111+
112+
`UPLOAD_PREFIX` would specify the _container_ name and any prefix within that container - the _account_ name is part of the remote definition and comes from `RCLONE_CONFIG_STORE_ACCOUNT` or the SAS URL.
113+
114+
### SFTP
115+
116+
- `RCLONE_CONFIG_STORE_TYPE=sftp`
117+
- `RCLONE_CONFIG_STORE_SHELL_TYPE=none`
118+
- `RCLONE_CONFIG_STORE_HOST=sftp-server-hostname`
119+
- `RCLONE_CONFIG_STORE_USER=myuser`
120+
- `RCLONE_CONFIG_STORE_KEY_FILE=/path/to/privatekey`
121+
- `RCLONE_CONFIG_STORE_KNOWN_HOSTS_FILE=/path/to/known_hosts`
122+
123+
You will need to mount the private key and `known_hosts` file into your container, set `UPLOAD_PREFIX` to the path on the server where you want to store the backup files - relative paths are resolved against the home directory of the authenticating user, if you want to store the files elsewhere then set the prefix to an _absolute_ path (starting with `/`, e.g. `UPLOAD_PREFIX=/mnt/backups`).
124+
125+
### SMB server
126+
127+
- `RCLONE_CONFIG_STORE_TYPE=smb`
128+
- `RCLONE_CONFIG_STORE_SMB_HOST=smb-server-hostname`
129+
- `RCLONE_CONFIG_STORE_SMB_USER=myuser`
130+
- `RCLONE_CONFIG_STORE_SMB_PASS=mypassword`
131+
- `RCLONE_CONFIG_STORE_SMB_DOMAIN=workgroup`
132+
133+
The `UPLOAD_PREFIX` should be of the form `sharename/path`
134+
135+
## Encryption
136+
137+
By default, the SQL dump files are stored as-is in the remote data store. If this is an off-site backup it may be desirable to have the files encrypted before upload.
138+
139+
These images support encryption using [rage](https://github.com/str4d/rage), which implements the https://age-encryption.org/v1 spec for file encryption. It encrypts the data stream with a random key using the `ChaCha20-Poly1305` cipher, then encrypts the session key using an elliptic curve asymmetric cipher. The `rage` implementation can use the SSH `ed25519` key format, and that is the simplest way to enable encryption:
140+
141+
1. Generate a public & private key pair using `ssh-keygen -t ed25519`
142+
2. Mount the _public_ key into your backup container
143+
3. set `ENCRYPT_RECIPIENTS_FILE=/path/to/id_ed25519.pub`
144+
145+
This will encrypt all files using the given public key (adding a `.age` extension to the file name) before uploading them to the data store. If you need to restore from such a file then you can decrypt it using the corresponding _private_ key, e.g. for PostgreSQL:
146+
147+
```shell
148+
rage -d -i /path/to/id_ed25519 mydb.sql.gz.age | gunzip | psql -X -d newdb
149+
```
150+
151+
Alternatively you can generate standard `age` key pairs using `rage-keygen` and then specify the `age1....` identity string directly in the environment variable `ENCRYPT_RECIPIENTS`, then use the corresponding private keys to decrypt when you need to restore from the backups.
152+
153+
154+
## Developer information
155+
156+
### Building the images
157+
158+
Images are built using `docker buildx bake` - running this on its own will build both the postgresql and mariadb images for your local architecture and load them into your docker image store to be run on your local machine.
159+
160+
To build just one or the other image, specify the name to the `bake` command, e.g. `docker buildx bake mariadb`.
161+
162+
To build multi-platform images and push them to a registry, use:
163+
164+
```shell
165+
PROD=true DBBR_REGISTRY=ghcr.io/gatenlp/ docker buildx bake --push
166+
```
167+
168+
`PROD=true` enables multi-platform image building (your `buildx` builder must be capable of generating these), and `DBBR_REGISTRY` is the registry prefix to which the images should be pushed. By default the images are tagged with both `:latest` and `:rclone-vX.Y.Z` for the version of `rclone` that they include.

common.sh

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
function check_config() {
2+
# Sanity check rclone configuration
3+
if [ -z "${REMOTE_NAME}" ]; then
4+
echo "You need to specify the REMOTE_NAME environment variable"
5+
exit 1
6+
fi
7+
8+
# If an UPLOAD_PREFIX was provided, ensure it ends with a slash
9+
if [ -n "${UPLOAD_PREFIX}" ]; then
10+
UPLOAD_PREFIX="${UPLOAD_PREFIX%/}/"
11+
fi
12+
13+
if [ -z "${BACKUP_DATABASES}" -a "${BACKUP_ALL}" != "true" ]; then
14+
echo "You need to set the BACKUP_DATABASES environment variable, or set BACKUP_ALL=true to dump all databases."
15+
exit 1
16+
fi
17+
}
18+
19+
# Check whether the given string contains any commas, i.e. if splitting on
20+
# comma would yield more than one item.
21+
function has_comma() {
22+
case "$1" in
23+
*,*)
24+
return 0
25+
;;
26+
*)
27+
return 1
28+
;;
29+
esac
30+
}
31+
32+
# Basic sanitizing of destination file names
33+
function check_filename() {
34+
case "$1" in
35+
/*)
36+
echo "Destination file name may not start with slash"
37+
exit 2
38+
;;
39+
*../*)
40+
echo "Destination file name may not include ../ path segments"
41+
exit 2
42+
;;
43+
esac
44+
}
45+
46+
function encrypting() {
47+
if [ -n "${ENCRYPT_RECIPIENTS}" -o -n "${ENCRYPT_RECIPIENTS_FILE}" ]; then
48+
return 0
49+
else
50+
return 1
51+
fi
52+
}
53+
54+
function post_process() {
55+
if encrypting; then
56+
if [ -n "${ENCRYPT_RECIPIENTS}" ]; then
57+
RAGE_OPTS=""
58+
for RECIP in ${ENCRYPT_RECIPIENTS} ; do
59+
RAGE_OPTS="$RAGE_OPTS -r $RECIP"
60+
done
61+
gzip | rage $RAGE_OPTS
62+
else
63+
gzip | rage -R "${ENCRYPT_RECIPIENTS_FILE}"
64+
fi
65+
else
66+
gzip
67+
fi
68+
}
69+
70+
# Upload a directory full of files to the destination location in the rclone remote
71+
function upload() {
72+
DIR="$1"
73+
74+
rclone copy "$DIR" "${REMOTE_NAME%:}:${UPLOAD_PREFIX}"
75+
}
76+
77+
# The main entrypoint for both postgresql and mariadb backups.
78+
# Expects the DB-specific entrypoint script to have defined
79+
# two functions, do_dump_all to dump all databases, and
80+
# do_dump_one that takes a single parameter for the name of
81+
# the database, and dumps that single database.
82+
function do_backup() {
83+
check_config
84+
85+
touch /tmp/started-at
86+
87+
BACKUP_DIR=$(mktemp -d /scratch/backup.XXXXXX)
88+
trap 'rm -rf $BACKUP_DIR' EXIT
89+
90+
if [ "${BACKUP_ALL}" = "true" ]; then
91+
DEST_FILE_PATTERN="all_%Y-%m-%dT%H-%M-%SZ.sql.gz"
92+
93+
if [ -n "${BACKUP_FILE_NAME}" ]; then
94+
DEST_FILE_NO_GZ="${BACKUP_FILE_NAME%.gz}"
95+
DEST_FILE_PATTERN="${BACKUP_FILE_NAME%.sql}.sql.gz"
96+
fi
97+
98+
DEST_FILE=$( date -r /tmp/started-at +"$DEST_FILE_PATTERN" | sed -e 's/[^a-zA-Z0-9_.\/]\+/-/g' -e 's/^-\|-$//g' )
99+
100+
check_filename "${DEST_FILE}"
101+
102+
if encrypting; then
103+
DEST_FILE="${DEST_FILE%.age}.age"
104+
fi
105+
106+
echo "Creating dump of all databases..."
107+
mkdir -p "$(dirname "${BACKUP_DIR}/${DEST_FILE}")"
108+
do_dump_all "$@" | post_process > "${BACKUP_DIR}/${DEST_FILE}"
109+
else
110+
DEST_FILE_PATTERN='${DB}-%Y-%m-%dT%H-%M-%SZ.sql.gz'
111+
112+
if [ -n "${BACKUP_FILE_NAME}" ]; then
113+
DEST_FILE_NO_GZ="${BACKUP_FILE_NAME%.gz}"
114+
DEST_FILE_PATTERN="${BACKUP_FILE_NAME%.sql}.sql.gz"
115+
fi
116+
case "$DEST_FILE_PATTERN" in
117+
*\$DB*|*\$\{DB\}*)
118+
# This is ok
119+
;;
120+
*)
121+
if has_comma "$DB"; then
122+
echo 'Destination file pattern does not include a $DB placeholder, and multiple databases are being dumped - this is not allowed as the later dump files would overwrite the first one.'
123+
exit 3
124+
fi
125+
esac
126+
127+
OIFS="$IFS"
128+
IFS=','
129+
for DB in $BACKUP_DATABASES
130+
do
131+
IFS="$OIFS"
132+
export DB
133+
THIS_DEST_FILE_PATTERN="$( echo -n "${DEST_FILE_PATTERN}" | envsubst '$DB' )"
134+
DEST_FILE=$( date -r /tmp/started-at +"$THIS_DEST_FILE_PATTERN" | sed -e 's/[^a-zA-Z0-9_.\/]\+/-/g' -e 's/^-\|-$//g' )
135+
136+
check_filename "${DEST_FILE}"
137+
138+
if encrypting; then
139+
DEST_FILE="${DEST_FILE%.age}.age"
140+
fi
141+
142+
echo "Creating dump of database ${DB}..."
143+
mkdir -p "$(dirname "${BACKUP_DIR}/${DEST_FILE}")"
144+
do_dump_one "$DB" "$@" | post_process > "${BACKUP_DIR}/${DEST_FILE}"
145+
done
146+
fi
147+
148+
echo "Uploading backup files"
149+
upload "$BACKUP_DIR"
150+
}

0 commit comments

Comments
 (0)