-
Notifications
You must be signed in to change notification settings - Fork 93
Open
Open
Copy link
Description
Summary
We would like to propose enhancements to the VitessBackupSchedule feature introduced via #553 in vitess-operator to improve operational flexibility, reduce bandwidth spikes, and simplify configuration for large deployments.
This request includes:
- Cluster/Keyspace-wide backup modes (all shards)
- Flexible scheduling with optional randomized distribution
1️⃣ Feature: Backup All Shards in Keyspace / Cluster
Problem
Currently, backups require specifying shard-level configuration. This becomes operationally heavy for:
- Large keyspaces with many shards
- Multi-keyspace clusters
- Environments where consistent backup policies are desired across all shards
Proposal
Introduce higher-level backup modes:
BackupAllShardsInKeyspaceBackupAllShardsInCluster
These modes would:
- Automatically discover shards
- Avoid requiring per-shard configuration
- Potentially execute backups in parallel (or configurable concurrency)
- Rework prior sequential implementation to avoid excessive runtime
Goal
Simplify configuration and reduce operational overhead for large Vitess deployments.
2️⃣ Feature: Flexible Backup Scheduling (Fixed or Randomized)
Current State
The schedule is currently cron-based and user-specified.
Problem
For large keyspaces, triggering backups for all shards at the same time can:
- Cause significant upload bandwidth spikes
- Create resource contention
- Increase operational risk
We can allow users to configure frequency (e.g., every 24 hours, every 12 hours), and then we:
- Generate per-shard randomized cron schedules
- Persist the selected cron schedule
- Stagger backups across shards
Proposal
Allow users to specify either:
Option A – Fixed Schedule
schedule: "0 1 * * *" # Every day at 1amOption B – Frequency-Based (Randomized)
backupEvery: 24h
randomizePerShard: trueBehavior:
- Operator generates a random cron schedule per shard
- Schedule is persisted
- First backup waits until scheduled time
- Backups are evenly distributed over time
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels