Skip to content

Feature Request: Keyspace-wide Backups & Flexible Scheduling in vitess-operator #752

@bluecrabs007

Description

@bluecrabs007

Summary

We would like to propose enhancements to the VitessBackupSchedule feature introduced via #553 in vitess-operator to improve operational flexibility, reduce bandwidth spikes, and simplify configuration for large deployments.

This request includes:

  1. Cluster/Keyspace-wide backup modes (all shards)
  2. Flexible scheduling with optional randomized distribution

1️⃣ Feature: Backup All Shards in Keyspace / Cluster

Problem

Currently, backups require specifying shard-level configuration. This becomes operationally heavy for:

  • Large keyspaces with many shards
  • Multi-keyspace clusters
  • Environments where consistent backup policies are desired across all shards

Proposal

Introduce higher-level backup modes:

  • BackupAllShardsInKeyspace
  • BackupAllShardsInCluster

These modes would:

  • Automatically discover shards
  • Avoid requiring per-shard configuration
  • Potentially execute backups in parallel (or configurable concurrency)
  • Rework prior sequential implementation to avoid excessive runtime

Goal

Simplify configuration and reduce operational overhead for large Vitess deployments.


2️⃣ Feature: Flexible Backup Scheduling (Fixed or Randomized)

Current State

The schedule is currently cron-based and user-specified.

Problem

For large keyspaces, triggering backups for all shards at the same time can:

  • Cause significant upload bandwidth spikes
  • Create resource contention
  • Increase operational risk

We can allow users to configure frequency (e.g., every 24 hours, every 12 hours), and then we:

  • Generate per-shard randomized cron schedules
  • Persist the selected cron schedule
  • Stagger backups across shards

Proposal

Allow users to specify either:

Option A – Fixed Schedule

schedule: "0 1 * * *"  # Every day at 1am

Option B – Frequency-Based (Randomized)

backupEvery: 24h
randomizePerShard: true

Behavior:

  • Operator generates a random cron schedule per shard
  • Schedule is persisted
  • First backup waits until scheduled time
  • Backups are evenly distributed over time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions