Add opt-in MariaDB-compatible services#161
Draft
bherila wants to merge 16 commits into
Draft
Conversation
Wire MariaDB into the canonical V2 BackupEngine system. Previously
`resolve_engine_key("mariadb")` returned Unsupported, so scheduled
MariaDB backups were silently skipped by enqueue_scheduled_run.
- mariadb_physical: physical `mariadb-backup --stream=mbstream | gzip`
base backup to S3 (PITR engine; analog of postgres_walg). Records
binlog coordinates (file/position/GTID) parsed from mariadb-backup
stderr into the metadata.json companion as the PITR replay anchor.
Success is verified via the "completed OK!" marker since the
container's dash shell has no pipefail.
- mariadb_dump: logical `mariadb-dump` fallback (analog of
postgres_pgdump), no PITR.
- mariadb_exec: shared standalone docker-exec helpers (stream stdout to
file, capture stderr) + binlog-position parser. Kept in temps-backup
to avoid a circular dep on temps-providers.
- dispatch: new "mariadb" arm probing both mariadb-backup and
mariadb-binlog on container `mariadb-{name}`; falls back to
mariadb_dump when the PITR tools are absent.
- plugin: register both engines (7 -> 9).
Credentials flow via MYSQL_PWD env, never argv (cf. PR gotempsh#149); unit
tests pin that invariant plus binlog-coordinate parsing and dispatch
fallback.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…tting Make Temps-managed MariaDB containers PITR-capable by enabling binary logging, the MariaDB analog of Postgres WAL archiving. - New BinlogArchiveInterval setting (1m/5m/15m/60m, default 5m) exposed in the parameter schema as an editable dropdown — the user controls PITR granularity (recovery-point objective) per service at runtime. - binlog_expire_logs_seconds is derived from the interval (>= 6x, floor 1h) so a binlog segment is never purged locally before the archiver ships it (the PITR continuity invariant). - Container cmd now appends --log-bin, --binlog-format=ROW, a stable non-zero --server-id derived from the service name, --sync-binlog=1 (durable binlog tail), and the derived retention. - Container env pins TZ=UTC so binlog timestamps — and therefore `mysqlbinlog --stop-datetime` PITR targets (RecoveryTarget::Time is UTC) — are unambiguous. Design note: binlog is enabled by default at (re)create rather than lazily on first backup. This avoids a cross-crate enable trigger (the backup runs in temps-backup; the container lifecycle lives here) and the silent-drop risk of marker-driven args. Existing containers adopt binlog on their next recreate (e.g. image upgrade); a healthy running container is not force-recreated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add the "frequent scheduled ship" half of MariaDB PITR: a per-service
background task that archives closed binary-log segments to S3 so
point-in-time recovery has the logs to replay.
- MariaDbService::archive_binlogs: one idempotent run — FLUSH BINARY
LOGS, SHOW BINARY LOGS, ship each closed segment (download via
bollard tar stream -> gzip -> S3 PUT), update a manifest. Advances
the manifest's last_shipped_file only past segments that actually
uploaded, stopping at the first failure so the replay chain stays
contiguous.
- S3 layout (the contract the restore path consumes):
{prefix}/external_services/mariadb/{svc}/binlog/{file}.gz and
.../binlog/manifest.json (BinlogManifest).
- Pure, unit-tested helpers reused by restore: parse_show_binary_logs,
closed_binlog_files, binlogs_to_ship, binlog_object_key,
binlog_manifest_key.
- Trigger: ExternalServiceHealthMonitor runs the archiver for running
standalone mariadb services on their binlog_archive_interval cadence
(cheap in-memory interval gate checked before the backup-schedule DB
scan). S3 destination is discovered from a covering enabled backup
schedule; no schedule => skip. All archiver failures are swallowed so
health monitoring is never disrupted.
Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Implement the restore side of MariaDB PITR on the ExternalService trait. - restore_capabilities: pitr=true, in-place + to-new-service. - restore_from_s3: detects a physical base.mbstream.gz vs a logical .sql.gz and dispatches; logical path refactored into a shared helper. - restore_to_new_service: clones the source config onto a fresh port, creates a new container, restores the base into it. - restore_pitr (core): guards that the base is a physical mariadb-backup with binlog coordinates (logical-only backups are rejected with a greppable "PITR ... physical" error); validates the recovery target BEFORE destroying data; restores the base; then forward-rolls. Physical base restore mirrors the postgres swap: disable restart policy -> stop -> ephemeral helper container (volumes_from, root) that mbstream-extracts, `mariadb-backup --prepare`, wipes the datadir, `--copy-back`, and `chown -R mysql:mysql` -> re-enable restart policy (always, even on failure) -> start -> wait healthy. The gunzipped mbstream is uploaded onto the created-but-not-started helper's writable layer (volumes_from can't add binds; avoids binary-over-exec corruption). Forward-roll reads the binlog manifest, downloads every shipped segment >= the base's binlog_file, and replays them in a SINGLE `mysqlbinlog --disable-log-bin --start-position=<base pos> [--stop-datetime| --stop-position] file1 file2 ... | mariadb` invocation (replayed events are not re-logged; a non-zero exit surfaces as a failure, never a silent half-apply). Recovery-target mapping: Time -> --stop-datetime (UTC); Lsn -> file:pos via --stop-position on the final segment; Xid (GTID) and Name fail fast with typed errors pending the Docker E2E hardening pass. Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add a binary-log health probe mirroring postgres_wal_health, surfacing
operational PITR risks to the UI/health metadata:
- BinlogDisabled (Critical) — log_bin off, PITR impossible.
- NonRowBinlogFormat — binlog_format != ROW, PITR fidelity risk.
- LargeBinlogBacklog { segment_count, total_bytes } — local binlogs
accumulating (> 50 segments or > 10 GiB), e.g. archiver behind or
retention too high / disk pressure.
probe_binlog_health connects via sqlx MySQL (root), reads log_bin /
binlog_format / binlog_expire_logs_seconds / gtid_strict_mode and
SHOW BINARY LOGS (count + total bytes), returns None on any connection
failure. Pure compute_warnings is unit-tested (14 tests); a
testcontainers integration suite (3 tests) covers binlog-off, binlog-on,
and bad-connection against real mariadb:lts. Credentials never logged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Restore-validation rehearsal against real mariadb:lts revealed the replay path hardcoded `mysqlbinlog`, which does NOT exist in the official image (only `mariadb-binlog`/`mariadb` are present; `mysqlbinlog`/`mysql` are not) — PITR replay would have failed on every real deployment. - replay_binlogs now resolves the tool at run time: prefer mariadb-binlog (and mariadb client), fall back to mysqlbinlog/mysql for non-MariaDB images. - Replay decodes to an intermediate file before feeding the client, so under `set -e` a failed decode aborts and surfaces as an error instead of being masked by the client's exit code in a pipe (no silent half-apply). The prior comment claimed this guard but the code piped. Verified end-to-end with a shell rehearsal: physical base backup + binlog archive + restore (mbstream extract -> --prepare -> copy-back -> chown) + mariadb-binlog --stop-datetime replay correctly recovers to a point in time (rows before T kept, after T dropped); the mariadb-backup binlog-position stderr format matches parse_binlog_position. Also adds 25 unit tests bringing MariaDB to parity with the Postgres provider: full database-name validation matrix, config/schema defaults and editability, address/env routing (baremetal + docker), import parsing, and the upgrade-is-not-implemented (MARIADB_AUTO_UPGRADE) decision. 74 mariadb tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
MyISAM/Aria tables are not crash-safe and don't recover consistently under point-in-time recovery — a documented cause of PITR failure. The binlog-health probe now counts non-InnoDB user tables (excluding system schemas) and emits a NonInnodbTables warning so operators can convert them to InnoDB before relying on PITR. Complements the other PITR-failure guards already in place: sync_binlog=1 and binlog_format=ROW (commit 489f344) address lost-binlog-on-crash and non-deterministic temp/in-memory/distributed-transaction replay; the LargeBinlogBacklog warning flags binlog disk pressure. +2 unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…PITR Make MariaDB point-in-time recovery work out of the box. Previously a MariaDB service had no base backup unless an operator manually created a schedule, so binlog archiving had nothing to anchor to. A reconcile loop (5-min tick in the backup plugin) ensures every MariaDB service has a covering daily full-backup schedule once a default S3 source is configured: - New `external_services.default_backup_provisioned` boolean (migration m20260623_000001, NOT NULL DEFAULT false, idempotent IF NOT EXISTS) is a one-shot latch: reconcile only provisions services where it's false and sets it true after creating the schedule, so a schedule the operator later deletes is never recreated. - reconcile_default_external_service_schedules: resolves the default S3 source (skips quietly + retries next tick if none exists yet — handles storage configured after the service), then for each unprovisioned MariaDB service creates a daily `0 0 3 * * *` (03:00 UTC) full schedule via the existing create_backup_schedule + attach_services_to_schedule (retention 14d, target_all=false, control_plane=false, default source). Per-service failures are logged and skipped so one can't block others; each creation is logged for operator visibility. Pairs with the 5-minute binlog ship default: daily physical base + continuous binlogs = restore to any point in time, with no manual setup. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VxXa3ZKZTjfKs1gQ8zcDcj
reconcile_default_external_service_schedules built the default schedule request with target_all_services=false AND include_control_plane=false, but create_backup_schedule rejects that combination at creation time (nothing to back up, and no services can be attached until the row exists). Every provision therefore failed, was swallowed by the warn+continue in reconcile, and produced zero schedules. Create the schedule with the control plane temporarily included, attach the MariaDB service, then flip include_control_plane off via update_backup_schedule -- which permits the otherwise-empty combination because a service is now attached. This reaches the intended service-scoped end state and makes reconcile_default_schedules_provisions_mariadb_once pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01VxXa3ZKZTjfKs1gQ8zcDcj
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in MariaDB-compatible external service for hosted projects while leaving Temps' own internal PostgreSQL database unchanged.
This includes:
mariadb-dumpwithmysqldumpfallbackPoint-in-time recovery (PITR)
mariadb-binlogfor replay (with parity tests)Notes
The service uses MariaDB images by default while exposing MySQL-compatible connection URLs and environment variables, since most frameworks and drivers connect to MariaDB through MySQL-compatible drivers.
MariaDB remains opt-in. Installing or starting Temps does not pull or run a MariaDB container; a container is created only when a user creates or imports a MariaDB external service. Linked projects receive separate per-project/per-environment databases inside that shared service.
Rebase & scope
main(f815de0b); 14 MariaDB-only commits, 0 behind.Validation
Ran from
~/proj/temps:cargo fmt --all -- --checkcargo check --lib -p temps-providers -p temps-backup -p temps-migrations— clean (exit 0) on the rebased branchcargo test -p temps-providers mariadb --lib— MariaDB-focused provider tests pass (13 tests)cd web && bun install && TEMPS_VERSION=pr-validation bun run buildDraft until #151 (Postgres upgrade hardening) is merged, which unblocks the MariaDB PITR e2e suite.