Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions docs/velero-with-swift-vsphere-csi-config-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Velero Backup Configuration

## Overview
Velero provides backup and disaster recovery for Kubernetes clusters using OpenStack Swift for object storage and vSphere CSI for volume snapshots.

## Key Configuration Choices

### Storage Backend
```yaml
backupStorageLocation:
- name: iad3-flex-dei7343-a9256
provider: community.openstack.org/openstack
bucket: k8s-dr-velero
```
**Why**: Uses OpenStack Swift via the community plugin for backup metadata storage.

**Note**: Initially attempted to use the AWS S3 plugin with Swift's S3-compatible endpoint, but Swift doesn't support AWS chunked uploads (used for large objects). The native OpenStack plugin provides better compatibility with Swift's API.

### CSI Snapshot Integration
```yaml
configuration:
features: EnableCSI
defaultSnapshotMoveData: false
defaultVolumesToFsBackup: false
volumeSnapshotLoc []
```
**Why**:
- `defaulCSI`: Enables CSI snapshot support for volume backups
- `defaultVolushotMoveData: false`: Uses CSI snapshots instead of file-level backups by default
- `defaultVolumesToFsBackup: false`: Prevents automatic file-level backups (opt-in only)
- `volumeSnapshotLocation: []`: Disables legacy VolumeSnapshotLocation (CSI uses VolumeSnapshotClass instead)

### VolumeSnapshotClass
```yaml
extraObjects:
- apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: velero-vsphere-snapshot-class
labels:
velero.io/csi-volumesnapshot-cl "true"
driver: csi.vspher
deletionPolicy: Delete
```
**Why**: Defines how Velero creates CSI snapshots. The label `velero.io/csi-volumesnapshot-class: "true"` tells Velero to use this class for backups.

### Credentials
```yaml
podEnvFrom:
- secretRef:
name: cloud-credentials
```
**Why**: OpenStack plugin requires environment variables (`OS_AUTH_URL`, `OS_APPLICATION_CREDENTIAL_ID`, etc.) for authentication. The secret contains both individual env vars and a `cloud` key for Velero's credential file mount.

### Node Agent
```yaml
deployNodeAgent: false
```
**Why**: Currently disabled. Kopia (the file-level backup engine) doesn't support OpenStack Swift backend. Requires further research into:
- Using S3-compatible Swift endpoint for Kopia
- Alternative storage backends for file-level backups
- Hybrid approach with separate storage for file-level vs metadata

CSI snapshots provide sufficient backup coverage for current needs.

## Common Pitfalls

### Kopia Backend Incompatibility
**Problem**: "invalid backend type community.openstack.org/openstack" errors during file-level backups.

**Solution**: Kopia (used by node-agent) doesn't support OpenStack. Set `defaultVolumesToFsBackup: false` to use CSI snapshots by default.

### Missing Environment Variables
**Problem**: "Missing input for argument [auth_url]" authentication errors.

**Solution**: The secret must be mounted as environment variables using `podEnvFrom`. Individual `OS_*` keys in the secret are required, not just a clouds.yaml file.

### Swift Temp URL Authentication
**Problem**: "401 Unauthorized: Temp URL invalid" errors.

**Solution**: Both `OS_SWIFT_TEMP_URL_KEY` and `OS_SWIFT_TEMP_URL_DIGEST` are required in the credentials secret. These must match the temp URL key configured on the Swift container.

```bash
# Set temp URL key on Swift container (if not already set)
swift post -m "Temp-URL-Key: <your-key>"
```

The `OS_SWIFT_TEMP_URL_KEY` value must match the key set on the container, and `OS_SWIFT_TEMP_URL_DIGEST` specifies the hash algorithm (typically `sha256`).

### VolumeSnapshotLocation Errors
**Problem**: "spec.provider: Required value" during Helm upgrade.

**Solution**: Set `volumeSnapshotLocation: []` to disable legacy snapshot locations. CSI snapshots use VolumeSnapshotClass instead.

### Pod Security Standards
**Problem**: node-agent DaemonSet fails with "violates PodSecurity" errors.

**Solution**: Velero namespace requires `privileged` Pod Security Standard for hostPath volumes.

## Required Secrets

### cloud-credentials
Contains OpenStack authentication credentials. Must include both environment variables and a `cloud` key.

```yaml
stringData:
# Environment variables for OpenStack plugin
OS_AUTH_URL: https://keystone.api.iad3.rackspacecloud.com/v3
OS_APPLICATION_CREDENTIAL_ID: <credential-id>
OS_APPLICATION_CREDENTIAL_SECRET: <credential-secret>
OS_REGION_NAME: IAD3
OS_SWIFT_TEMP_URL_KEY: <temp-url-key>
OS_SWIFT_TEMP_URL_DIGEST: sha256
```

**Key Fields**:
- `OS_AUTH_URL`: OpenStack Keystone endpoint (required)
- `OS_APPLICATION_CREDENTIAL_ID`: Application credential ID (required)
- `OS_APPLICATION_CREDENTIAL_SECRET`: Application credential secret (required)
- `OS_REGION_NAME`: OpenStack region (required)
- `OS_SWIFT_TEMP_URL_KEY`: Temp URL key for Swift authentication (required, must match container setting)
- `OS_SWIFT_TEMP_URL_DIGEST`: Hash algorithm for temp URLs (required, typically `sha256`)

## Verification
```bash
# Check backup storage location
kubectl get backupstoragelocation -n velero

# Verify CSI snapshot class
kubectl get volumesnapshotclass velero-vsphere-snapshot-class

# Test backup
velero backup create test --include-namespaces=default

# Check backup status
velero backup describe test --details

# View backup logs
velero backup logs test
```

## Backup Usage

### CSI Snapshot Backup (Current Method)
```bash
velero backup create my-backup --include-namespaces=myapp
```

### File-Level Backup (Future)
File-level backups via node-agent are currently disabled and require:
- Compatible storage backend (Kopia doesn't support OpenStack)
- Additional testing and validation
- Possible migration to S3-compatible Swift endpoint or alternative backend

For now, all backups use CSI snapshots exclusively.
114 changes: 114 additions & 0 deletions docs/vsphere-csi-config-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# vSphere CSI Driver Configuration

## Overview
vSphere CSI driver provides persistent storage and snapshot capabilities for Kubernetes workloads running on vSphere infrastructure.

## Key Configuration Choices

### Snapshot Support
```yaml
controller:
replicaCount: 3
config:
block-volume-snapshot: true
snapshotter:
image:
registry: registry.k8s.io
repository: sig-storage/csi-snapshotter
tag: v8.2.0
```
**Why**:
- `block-volume-snapshot: true`: Enables block volume snapshot capability in the CSI driver
- `snapshotter` sidecar: Required for the CSI controller to handle VolumeSnapshot requests from Velero

Both settings are required for CSI snapshot functionality.

### Snapshot Controller
```yaml
snapshot:
controller:
enabled: true
```
**Why**: Deploys the snapshot-controller which watches VolumeSnapshot resources and coordinates with the CSI driver to create snapshots.

## Common Pitfalls

### Missing Snapshotter Sidecar
**Problem**: VolumeSnapshots stuck in "Waiting for CSI driver" state.

**Solution**: The `controller.snapshotter` configuration must be present in helm values. The snapshotter sidecar container is NOT enabled by default and must be explicitly configured.

**Verification**:
```bash
kubectl get pod -n vmware-system-csi <controller-pod> -o jsonpath='{.spec.containers[*].name}'
```
Should include `csi-snapshotter` in the output.

### Pod Security Standards
**Problem**: CSI pods fail to start with "violates PodSecurity" errors.

**Solution**: The vmware-system-csi namespace requires `privileged` Pod Security Standard due to hostPath volumes and privileged containers.

```yaml
metadata:
labels:
pod-security.kubernetes.io/enforce: privileged
```

## Required Secrets

### vsphere-config-secret (CSI Driver)
Contains vSphere connection details for the CSI driver. Key: `csi-vsphere.conf`

```ini
[Global]
cluster-id = "k8s-dr"

[VirtualCenter "vcenter.example.com"]
insecure-flag = "true"
user = "[email protected]"
password = "password"
port = "443"
datacenters = "Datacenter1"
```

**Key Fields**:
- `cluster-id`: Unique identifier for this Kubernetes cluster
- `insecure-flag`: Set to "true" for self-signed certificates
- `datacenters`: vSphere datacenter name(s)

### vsphere-cpi-secret (Cloud Provider Interface)
Contains vSphere configuration for the CPI. Key: `vsphere.conf`

```yaml
global:
port: 443
insecureFlag: true

vcenter:
vcenter-name:
server: vcenter.example.com
user: [email protected]
password: "password"
datacenters:
- Datacenter1
```

**Key Fields**:
- `vcenter-name`: Arbitrary name for this vCenter (used as identifier)
- `server`: vCenter hostname or IP
- `datacenters`: List of datacenter names

**Note**: Both secrets use the same vSphere credentials but different formats (INI vs YAML).

## Verification
```bash
# Check CSI driver is registered
kubectl get csidrivers csi.vsphere.vmware.com

# Verify snapshot controller is running
kubectl get pods -n vmware-system-csi | grep snapshot-controller

# Test snapshot capability
kubectl get volumesnapshotclass
```