Skip to content

Add opt-in MariaDB-compatible services#161

Draft
bherila wants to merge 16 commits into
gotempsh:mainfrom
bherila:feature/mariadb-services-upstream-pr
Draft

Add opt-in MariaDB-compatible services#161
bherila wants to merge 16 commits into
gotempsh:mainfrom
bherila:feature/mariadb-services-upstream-pr

Conversation

@bherila

@bherila bherila commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an opt-in MariaDB-compatible external service for hosted projects while leaving Temps' own internal PostgreSQL database unchanged.

Recreated from #138, which could not be reopened after the branch was rebased. This PR supersedes #138 and is now scoped to MariaDB/PITR only — the Postgres upgrade hardening that previously rode along has been split out into #151.

This includes:

  • a new MariaDB service implementation with create/import/update/delete, credentials, environment bindings, and backup/restore support via mariadb-dump with mysqldump fallback
  • MariaDB query explorer support, including schema/table discovery and guarded SQL execution helpers
  • frontend service presets/forms/icons so users can choose MariaDB from the existing service workflows
  • conservative small-host defaults for MariaDB containers, plus standard/dedicated size profiles for larger hosts
  • MDX docs covering MariaDB managed services, project linking, Laravel usage, backup coverage, and opt-in service startup behavior

Point-in-time recovery (PITR)

  • MariaDB binary logging enabled for PITR, with a configurable interval setting
  • binary logs shipped to S3 for PITR
  • restore + point-in-time recovery using mariadb-binlog for replay (with parity tests)
  • a binlog-health probe surfacing operational warnings
  • non-InnoDB table warnings as a PITR safety guard
  • auto-provisioning of a default daily base schedule for MariaDB PITR
  • repaired MariaDB PITR restore chain

Notes

The service uses MariaDB images by default while exposing MySQL-compatible connection URLs and environment variables, since most frameworks and drivers connect to MariaDB through MySQL-compatible drivers.

MariaDB remains opt-in. Installing or starting Temps does not pull or run a MariaDB container; a container is created only when a user creates or imports a MariaDB external service. Linked projects receive separate per-project/per-environment databases inside that shared service.

Rebase & scope

Validation

Ran from ~/proj/temps:

  • cargo fmt --all -- --check
  • cargo check --lib -p temps-providers -p temps-backup -p temps-migrations — clean (exit 0) on the rebased branch
  • cargo test -p temps-providers mariadb --lib — MariaDB-focused provider tests pass (13 tests)
  • cd web && bun install && TEMPS_VERSION=pr-validation bun run build

Draft until #151 (Postgres upgrade hardening) is merged, which unblocks the MariaDB PITR e2e suite.

bherila and others added 14 commits June 26, 2026 18:31
Wire MariaDB into the canonical V2 BackupEngine system. Previously
`resolve_engine_key("mariadb")` returned Unsupported, so scheduled
MariaDB backups were silently skipped by enqueue_scheduled_run.

- mariadb_physical: physical `mariadb-backup --stream=mbstream | gzip`
  base backup to S3 (PITR engine; analog of postgres_walg). Records
  binlog coordinates (file/position/GTID) parsed from mariadb-backup
  stderr into the metadata.json companion as the PITR replay anchor.
  Success is verified via the "completed OK!" marker since the
  container's dash shell has no pipefail.
- mariadb_dump: logical `mariadb-dump` fallback (analog of
  postgres_pgdump), no PITR.
- mariadb_exec: shared standalone docker-exec helpers (stream stdout to
  file, capture stderr) + binlog-position parser. Kept in temps-backup
  to avoid a circular dep on temps-providers.
- dispatch: new "mariadb" arm probing both mariadb-backup and
  mariadb-binlog on container `mariadb-{name}`; falls back to
  mariadb_dump when the PITR tools are absent.
- plugin: register both engines (7 -> 9).

Credentials flow via MYSQL_PWD env, never argv (cf. PR gotempsh#149); unit
tests pin that invariant plus binlog-coordinate parsing and dispatch
fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…tting

Make Temps-managed MariaDB containers PITR-capable by enabling binary
logging, the MariaDB analog of Postgres WAL archiving.

- New BinlogArchiveInterval setting (1m/5m/15m/60m, default 5m) exposed
  in the parameter schema as an editable dropdown — the user controls
  PITR granularity (recovery-point objective) per service at runtime.
- binlog_expire_logs_seconds is derived from the interval (>= 6x, floor
  1h) so a binlog segment is never purged locally before the archiver
  ships it (the PITR continuity invariant).
- Container cmd now appends --log-bin, --binlog-format=ROW, a stable
  non-zero --server-id derived from the service name, --sync-binlog=1
  (durable binlog tail), and the derived retention.
- Container env pins TZ=UTC so binlog timestamps — and therefore
  `mysqlbinlog --stop-datetime` PITR targets (RecoveryTarget::Time is
  UTC) — are unambiguous.

Design note: binlog is enabled by default at (re)create rather than
lazily on first backup. This avoids a cross-crate enable trigger (the
backup runs in temps-backup; the container lifecycle lives here) and the
silent-drop risk of marker-driven args. Existing containers adopt binlog
on their next recreate (e.g. image upgrade); a healthy running container
is not force-recreated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add the "frequent scheduled ship" half of MariaDB PITR: a per-service
background task that archives closed binary-log segments to S3 so
point-in-time recovery has the logs to replay.

- MariaDbService::archive_binlogs: one idempotent run — FLUSH BINARY
  LOGS, SHOW BINARY LOGS, ship each closed segment (download via
  bollard tar stream -> gzip -> S3 PUT), update a manifest. Advances
  the manifest's last_shipped_file only past segments that actually
  uploaded, stopping at the first failure so the replay chain stays
  contiguous.
- S3 layout (the contract the restore path consumes):
  {prefix}/external_services/mariadb/{svc}/binlog/{file}.gz and
  .../binlog/manifest.json (BinlogManifest).
- Pure, unit-tested helpers reused by restore: parse_show_binary_logs,
  closed_binlog_files, binlogs_to_ship, binlog_object_key,
  binlog_manifest_key.
- Trigger: ExternalServiceHealthMonitor runs the archiver for running
  standalone mariadb services on their binlog_archive_interval cadence
  (cheap in-memory interval gate checked before the backup-schedule DB
  scan). S3 destination is discovered from a covering enabled backup
  schedule; no schedule => skip. All archiver failures are swallowed so
  health monitoring is never disrupted.

Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Implement the restore side of MariaDB PITR on the ExternalService trait.

- restore_capabilities: pitr=true, in-place + to-new-service.
- restore_from_s3: detects a physical base.mbstream.gz vs a logical
  .sql.gz and dispatches; logical path refactored into a shared helper.
- restore_to_new_service: clones the source config onto a fresh port,
  creates a new container, restores the base into it.
- restore_pitr (core): guards that the base is a physical mariadb-backup
  with binlog coordinates (logical-only backups are rejected with a
  greppable "PITR ... physical" error); validates the recovery target
  BEFORE destroying data; restores the base; then forward-rolls.

Physical base restore mirrors the postgres swap: disable restart policy
-> stop -> ephemeral helper container (volumes_from, root) that
mbstream-extracts, `mariadb-backup --prepare`, wipes the datadir,
`--copy-back`, and `chown -R mysql:mysql` -> re-enable restart policy
(always, even on failure) -> start -> wait healthy. The gunzipped
mbstream is uploaded onto the created-but-not-started helper's writable
layer (volumes_from can't add binds; avoids binary-over-exec corruption).

Forward-roll reads the binlog manifest, downloads every shipped segment
>= the base's binlog_file, and replays them in a SINGLE `mysqlbinlog
--disable-log-bin --start-position=<base pos> [--stop-datetime|
--stop-position] file1 file2 ... | mariadb` invocation (replayed events
are not re-logged; a non-zero exit surfaces as a failure, never a silent
half-apply). Recovery-target mapping: Time -> --stop-datetime (UTC);
Lsn -> file:pos via --stop-position on the final segment; Xid (GTID) and
Name fail fast with typed errors pending the Docker E2E hardening pass.

Credentials flow only via MYSQL_PWD/MARIADB_PWD exec env, never argv.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Add a binary-log health probe mirroring postgres_wal_health, surfacing
operational PITR risks to the UI/health metadata:

- BinlogDisabled (Critical) — log_bin off, PITR impossible.
- NonRowBinlogFormat — binlog_format != ROW, PITR fidelity risk.
- LargeBinlogBacklog { segment_count, total_bytes } — local binlogs
  accumulating (> 50 segments or > 10 GiB), e.g. archiver behind or
  retention too high / disk pressure.

probe_binlog_health connects via sqlx MySQL (root), reads log_bin /
binlog_format / binlog_expire_logs_seconds / gtid_strict_mode and
SHOW BINARY LOGS (count + total bytes), returns None on any connection
failure. Pure compute_warnings is unit-tested (14 tests); a
testcontainers integration suite (3 tests) covers binlog-off, binlog-on,
and bad-connection against real mariadb:lts. Credentials never logged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
Restore-validation rehearsal against real mariadb:lts revealed the replay
path hardcoded `mysqlbinlog`, which does NOT exist in the official image
(only `mariadb-binlog`/`mariadb` are present; `mysqlbinlog`/`mysql` are
not) — PITR replay would have failed on every real deployment.

- replay_binlogs now resolves the tool at run time: prefer mariadb-binlog
  (and mariadb client), fall back to mysqlbinlog/mysql for non-MariaDB
  images.
- Replay decodes to an intermediate file before feeding the client, so
  under `set -e` a failed decode aborts and surfaces as an error instead
  of being masked by the client's exit code in a pipe (no silent
  half-apply). The prior comment claimed this guard but the code piped.

Verified end-to-end with a shell rehearsal: physical base backup +
binlog archive + restore (mbstream extract -> --prepare -> copy-back ->
chown) + mariadb-binlog --stop-datetime replay correctly recovers to a
point in time (rows before T kept, after T dropped); the
mariadb-backup binlog-position stderr format matches parse_binlog_position.

Also adds 25 unit tests bringing MariaDB to parity with the Postgres
provider: full database-name validation matrix, config/schema defaults
and editability, address/env routing (baremetal + docker), import
parsing, and the upgrade-is-not-implemented (MARIADB_AUTO_UPGRADE)
decision. 74 mariadb tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
MyISAM/Aria tables are not crash-safe and don't recover consistently
under point-in-time recovery — a documented cause of PITR failure. The
binlog-health probe now counts non-InnoDB user tables (excluding system
schemas) and emits a NonInnodbTables warning so operators can convert
them to InnoDB before relying on PITR.

Complements the other PITR-failure guards already in place: sync_binlog=1
and binlog_format=ROW (commit 489f344) address lost-binlog-on-crash and
non-deterministic temp/in-memory/distributed-transaction replay; the
LargeBinlogBacklog warning flags binlog disk pressure. +2 unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
…PITR

Make MariaDB point-in-time recovery work out of the box. Previously a
MariaDB service had no base backup unless an operator manually created a
schedule, so binlog archiving had nothing to anchor to.

A reconcile loop (5-min tick in the backup plugin) ensures every MariaDB
service has a covering daily full-backup schedule once a default S3
source is configured:
- New `external_services.default_backup_provisioned` boolean (migration
  m20260623_000001, NOT NULL DEFAULT false, idempotent IF NOT EXISTS) is
  a one-shot latch: reconcile only provisions services where it's false
  and sets it true after creating the schedule, so a schedule the
  operator later deletes is never recreated.
- reconcile_default_external_service_schedules: resolves the default S3
  source (skips quietly + retries next tick if none exists yet — handles
  storage configured after the service), then for each unprovisioned
  MariaDB service creates a daily `0 0 3 * * *` (03:00 UTC) full schedule
  via the existing create_backup_schedule + attach_services_to_schedule
  (retention 14d, target_all=false, control_plane=false, default source).
  Per-service failures are logged and skipped so one can't block others;
  each creation is logged for operator visibility.

Pairs with the 5-minute binlog ship default: daily physical base +
continuous binlogs = restore to any point in time, with no manual setup.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CCbeSZTdfxBabYBvjZFojC
bherila and others added 2 commits June 26, 2026 19:49
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VxXa3ZKZTjfKs1gQ8zcDcj
reconcile_default_external_service_schedules built the default schedule
request with target_all_services=false AND include_control_plane=false,
but create_backup_schedule rejects that combination at creation time
(nothing to back up, and no services can be attached until the row
exists). Every provision therefore failed, was swallowed by the
warn+continue in reconcile, and produced zero schedules.

Create the schedule with the control plane temporarily included, attach
the MariaDB service, then flip include_control_plane off via
update_backup_schedule -- which permits the otherwise-empty combination
because a service is now attached. This reaches the intended
service-scoped end state and makes reconcile_default_schedules_provisions_mariadb_once pass.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01VxXa3ZKZTjfKs1gQ8zcDcj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant