Skip to content

Conversation

shubham-pampattiwar
Copy link
Member

@shubham-pampattiwar shubham-pampattiwar commented Oct 2, 2025

Summary

Implements Issue #6: Design and create container with tools to access restored file systems.

This PR adds a file-server container that provides comprehensive tooling to mount VM disk images (qcow2, raw) and access their filesystems from Velero/OADP backups. This container will be used by the VMFR controller (Issue #7) to create file-serving pods.

Status: ✅ Live cluster testing completed successfully on OpenShift Virtualization

Design Decision: All-in-One Container

After researching KubeVirt and OADP patterns, we chose an all-in-one container approach over a plugin architecture:

Rationale:

  • ✅ Limited, well-defined set of VM filesystem types (ext4, xfs, ntfs, btrfs)
  • ✅ Tools are lightweight when combined (~1.1GB total)
  • ✅ Simpler deployment, maintenance, and debugging
  • ✅ Plugin complexity not justified for this focused use case

Technology Stack

libguestfs - Core VM Disk Access

  • Mounts VM disk images without booting the VM
  • Uses internal QEMU appliance to read disk formats
  • Exposes filesystems via FUSE

FUSE (Filesystem in Userspace)

  • Mounts filesystems in userspace (safer than kernel mounting)
  • No kernel modules required
  • Standard Kubernetes/OpenShift technology

QEMU with KVM Hardware Acceleration

  • libguestfs uses QEMU to read disk images
  • KVM required for performance: 30 seconds vs 5+ minutes
  • Requires /dev/kvm access

Security Model (Based on Live Testing)

Initial Assumption: FUSE-based mounting = no privileged mode needed ✅

Live Testing Reality: Privileged mode IS required for performance ⚠️

Why Privileged Mode?

libguestfs → QEMU → /dev/kvm (hardware acceleration)
                     ↑
                     SELinux blocks non-privileged access

With KVM (privileged mode):

  • libguestfs ready in ~30 seconds ✅
  • Acceptable user experience

Without KVM (non-privileged):

  • Falls back to TCG software emulation
  • libguestfs takes 5+ minutes (often times out) ❌
  • Unacceptable for production

Security Justification

Comparison with KubeVirt VMs:

virt-launcher pod (runs VMs):
- privileged: true
- Runs as qemu user (107:107)
- Access to /dev/kvm
- Runs arbitrary guest OS code

file-server pod:
- privileged: true (for /dev/kvm only)
- Runs as qemu user (107:107)  
- Access to /dev/kvm
- Runs libguestfs (trusted, signed code)

Risk level: SAME or LOWER than running VMs

Conclusion: If cluster runs OpenShift Virtualization, it already allows this security model.

Security Requirements

  • ✅ Privileged mode (for /dev/kvm access with SELinux)
  • ✅ qemu user (107:107) - matches VM disk ownership
  • ✅ SELinux MCS labels must match PVC
  • ✅ PVC mounted read-write (libguestfs needs write to disk file for COW overlay)
  • ✅ kubevirt-controller SCC (allows privileged + hostPath)
  • ✅ Filesystem still mounted read-only (--ro flag)

Live Cluster Testing Results ✅

Environment:

  • Cluster: OpenShift Virtualization (OCP Virt)
  • Test VM: Fedora with XFS filesystem
  • VM Disk: 9.8GB raw format in PVC
  • Test Files: 4 files in different locations

Test Results:

✅ File-server pod created successfully
✅ VM disk PVC mounted (with correct security context)
✅ libguestfs appliance booted (~30 seconds with KVM)
✅ XFS filesystem mounted via guestmount
✅ All test files accessible:
   - /root/test-file.txt
   - /root/oadp-validation.json
   - /etc/oadp-test-marker
   - /home/testuser/validation.txt

Performance:

  • Appliance boot: ~30 seconds (with KVM hardware acceleration)
  • Filesystem mount: ~5 seconds
  • File access: Instant

Key Features

  • VM disk format detection: qemu-img automatically detects qcow2, raw, vmdk, vdi
  • FUSE-based mounting: guestmount mounts disk images via userspace
  • Comprehensive filesystem support: ext2/3/4, XFS, Btrfs, NTFS, FAT/FAT32
  • LVM support: guestmount auto-handles VMs with LVM volumes
  • Read-only safety: All filesystems mounted with --ro for data integrity
  • Auto-detection: Automatically discovers and mounts all disk images
  • Hardware acceleration: KVM support for fast operation

Comprehensive Documentation

All decisions are explained in detail:

README.md Enhancements

  • Key Concepts section: Explains libguestfs, FUSE, QEMU/KVM, hardware acceleration
  • Complete Workflow section: Step-by-step VM backup → restore → file access with diagrams
  • Security section: Explains ALL security decisions:
    • Why privileged mode (SELinux + /dev/kvm)
    • Why qemu user (VM disk ownership)
    • Why SELinux MCS labels (pod isolation)
    • Why RW PVC mount (libguestfs COW overlay)
    • Security justification vs KubeVirt VMs
    • Complete checklist for controller implementation

Dockerfile Enhancements

  • Every package documented with WHY and WHAT
  • 17 packages organized into 5 categories
  • FUSE architecture explained

Deliverables

containers/file-server/
├── Dockerfile                      # Fedora 40, comprehensively documented
├── README.md                       # 850+ lines with complete technical docs
├── test-container.sh               # Validation tests (all passing ✓)
├── test-pod.yaml                   # Working pod config from live testing
├── test-vm.yaml                    # Test VM manifest
├── .gitignore                      # Excludes analysis files
└── scripts/
    ├── detect-and-mount.sh         # 313 lines, FUSE-based mounting
    └── entrypoint.sh               # Simple entrypoint

Test Plan

Local Testing

cd containers/file-server/
./test-container.sh all

Results: ✅ All tests passing

Upstream/Downstream Strategy

Upstream (this PR):

  • Base image: Fedora 40
  • Full multi-arch support (ARM64 + x86_64)
  • All packages natively available (no EPEL needed)

Downstream (Red Hat Product):

  • Red Hat will rebuild with RHEL9 base + subscriptions
  • Follows standard Red Hat practice (e.g., Velero uses Ubuntu upstream, RHEL downstream)

Acceptance Criteria

  • Design accepted (all-in-one documented in README)
  • Container supports common VM filesystem types (ext4, xfs, ntfs, btrfs, FAT)
  • Container includes necessary filesystem utilities and drivers
  • Support for read-only mounting to ensure data safety
  • Kubernetes-compatible mounting (FUSE-based)
  • Security requirements documented (privileged mode, qemu user, SELinux)
  • Live cluster testing completed successfully
  • Functionality works as described
  • Tests are added/updated
  • Documentation is comprehensive (explains WHY and WHAT)

Integration Path

Related

🤖 Generated with Claude Code

Live Cluster Testing

# 1. Create test VM
oc apply -f test-vm.yaml

# 2. Add test data to VM, then stop it

# 3. Deploy file-server pod (using detect-and-mount.sh)
oc apply -f test-pod.yaml

# 4. Verify pod is running
oc get pod file-server-test
# Expected: Running status

# 5. Verify filesystem is mounted
oc exec file-server-test -- mount | grep /mnt/filesystems
# Expected: /dev/fuse on /mnt/filesystems/disk

# 6. Verify test files are accessible
oc exec file-server-test -- cat /mnt/filesystems/disk/root/test-file.txt
oc exec file-server-test -- cat /mnt/filesystems/disk/etc/oadp-test-marker
oc exec file-server-test -- cat /mnt/filesystems/disk/home/testuser/validation.txt

Results: ✅ Complete end-to-end success

  • Pod status: Running (stays alive with sleep infinity)
  • Automated mounting: detect-and-mount.sh successfully runs at pod startup
  • Filesystem mounted: /dev/fuse on /mnt/filesystems/disk type fuse
  • All test files accessible from multiple directories (/root/, /etc/, /home/testuser/)
  • FUSE mounts remain active while pod runs

Acceptance Criteria

  • Design accepted (all-in-one documented in README)
  • Container supports common VM filesystem types (ext4, xfs, ntfs, btrfs, FAT)
  • Container includes necessary filesystem utilities and drivers
  • Support for read-only mounting to ensure data safety
  • Kubernetes-compatible mounting (FUSE-based)
  • Security requirements documented (privileged mode, qemu user, SELinux)
  • Live cluster testing completed successfully
  • Functionality works as described
  • Tests are added/updated
  • Documentation is comprehensive (explains WHY and WHAT)

Integration Path

Related

🤖 Generated with Claude Code

shubham-pampattiwar and others added 7 commits October 2, 2025 13:46
Implements Issue migtools#6: Design and create container with tools to access restored file systems

This container provides comprehensive tooling to mount VM disk images (qcow2, raw)
and access their filesystems (ext4, xfs, ntfs, btrfs, etc.) from Velero/OADP backups.

Design decisions:
- All-in-one container approach (not plugin-based) for simplicity and maintainability
- Fedora 40 base image with full multi-arch support (ARM64 + x86_64)
- Follows Red Hat upstream/downstream pattern (Fedora upstream, RHEL9 downstream)
- Non-root user (UID 1001) following OpenShift security best practices

Key features:
- Automatic VM disk format detection (qemu-img)
- NBD-based disk mounting (qemu-nbd)
- Comprehensive filesystem support (ext4, xfs, ntfs, btrfs, FAT)
- Read-only mounting for data safety
- Helper scripts for detection and mounting
- Comprehensive tests and documentation

Deliverables:
- containers/file-server/Dockerfile (Fedora 40 base, fully commented)
- containers/file-server/scripts/detect-and-mount.sh (370 lines, auto-mount logic)
- containers/file-server/scripts/entrypoint.sh (Simple entrypoint)
- containers/file-server/test-container.sh (Validation tests - all passing)
- containers/file-server/README.md (Complete documentation)

Acceptance criteria met:
✓ Design accepted (all-in-one approach documented)
✓ Container supports common VM filesystem types
✓ Includes necessary filesystem utilities and drivers
✓ Read-only mounting for data safety
✓ Functionality works as described (tests pass)
✓ Tests added/updated
✓ Documentation updated

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Replace NBD (qemu-nbd) approach with FUSE-based guestmount to ensure
the container works in Kubernetes without requiring privileged mode.

Key Changes:
- detect-and-mount.sh: Complete rewrite to use guestmount instead of NBD
  - Removed: connect_nbd_device(), detect_filesystem(), mount_filesystem()
  - Added: mount_disk_with_guestmount(), unmount_disk()
  - Simplified process_disk_image() - guestmount auto-handles partitions/LVM
- Dockerfile: Removed nbd and qemu-kvm-core packages (NBD-specific)
- README.md: Updated security section to highlight no privileged mode needed
- test-container.sh: Changed to verify guestmount tools instead of qemu-nbd

Benefits:
✅ No privileged container required
✅ No kernel module loading (no modprobe nbd)
✅ No /dev access needed
✅ Works with OpenShift SecurityContextConstraints
✅ Compatible with standard Kubernetes security policies

Trade-off: Slightly slower than NBD, but security > speed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Add libguestfs-xfs package to provide XFS filesystem support for the
libguestfs appliance used by guestmount. This ensures VMs with XFS
root filesystems can be properly mounted and accessed.

Based on peer review feedback comparing package lists with their
kopia toolbox container implementation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
This commit completes Issue migtools#6 with extensive documentation explaining
all design decisions, security requirements, and technical concepts.

Major Documentation Enhancements:

1. README.md - Comprehensive Technical Documentation
   - New "Key Concepts" section explaining:
     * libguestfs - How it accesses VM disks without booting
     * FUSE - Userspace filesystem mounting architecture
     * QEMU/KVM - Hardware acceleration and performance impact
     * TCG vs KVM - 30 seconds vs 5+ minutes boot time comparison

   - New "Complete Workflow" section with step-by-step diagrams:
     * VM Backup → Restore → File Access flow
     * Timeline breakdowns (with/without KVM)
     * Integration with VMFR controller

   - Massively expanded "Security" section explaining:
     * Why privileged mode required (SELinux + /dev/kvm access)
     * Why qemu user (107:107) - matches VM disk ownership
     * Why SELinux MCS labels must match PVC
     * Why PVC needs RW mount (libguestfs COW overlay)
     * Why kubevirt-controller SCC
     * Security justification vs KubeVirt VMs threat model
     * Complete security requirements checklist for controller

2. Dockerfile - Enhanced Tool Explanations
   - Updated FUSE section with architecture diagram
   - Clarified privileged mode requirement from live testing
   - All 17 packages documented with why/what/coverage

3. Live Cluster Testing Artifacts
   - test-pod.yaml: Complete working pod configuration
     * Tested with real VM disk (9.8GB raw, XFS filesystem)
     * All security contexts documented and explained
     * Privileged mode, qemu user, SELinux labels

   - test-vm.yaml: Test VM for end-to-end validation
     * Fedora VM with cloud-init
     * Clear testing workflow instructions

Live Cluster Testing Results (October 2025):
- Cluster: OpenShift Virtualization (OCP Virt)
- VM Disk: 9.8GB raw format with XFS filesystem
- Test Data: 4 files in different locations (/root, /etc, /home)
- Result: ✅ All files successfully accessible via guestmount
- Performance: ~30 seconds with KVM (vs 5+ min without)

Key Findings:
- Privileged mode IS required for /dev/kvm access (SELinux)
- qemu user (107:107) required for VM disk PVC access
- SELinux MCS labels must match between pod and PVC
- PVC must be mounted RW (libguestfs needs write to disk file)
- Same security model as KubeVirt virt-launcher pods

All documentation now explains WHY each decision was made and
WHAT each technology does, ensuring anyone can understand the
complete architecture and security model.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
FUSE mounts require the container to stay running. After mounting
filesystems with guestmount, the script now enters 'sleep infinity'
to keep the container alive and maintain the FUSE mounts.

Without this, the script would exit, causing the container to exit,
and Kubernetes would restart it in an infinite loop.

Future: This will be replaced with SSH/HTTP server (Issues migtools#8, migtools#9)
which will naturally keep the container alive while serving files.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Changes:
1. detect-and-mount.sh: Fixed script exit issue
   - Changed 'set -euo pipefail' to 'set -uo pipefail' (removed -e flag)
   - Root cause: pipefail with 'set -e' was causing early script exit
   - Script now successfully mounts filesystem and keeps container alive
   - Added 'exec sleep infinity' to maintain FUSE mounts

2. test-pod.yaml: Updated for automated mounting
   - Changed command from 'sleep infinity' to '/usr/local/bin/detect-and-mount.sh'
   - Added HOME=/tmp environment variable for libguestfs cache

3. README.md: Updated testing status
   - Added automated script testing results
   - Confirmed pod runs continuously with FUSE mounts active
   - All test files verified accessible

Live cluster test results (OpenShift Virtualization):
✅ Pod status: Running (stays alive)
✅ Automated mounting: detect-and-mount.sh runs successfully at startup
✅ Filesystem mounted: /dev/fuse on /mnt/filesystems/disk type fuse
✅ Test files accessible: /root/test-file.txt, /etc/oadp-test-marker, /home/testuser/validation.txt
✅ FUSE mounts remain active while pod runs

This completes Issue migtools#6 - file-server container is ready for controller integration (Issue migtools#7).
Created CONTROLLER_INTEGRATION.md with complete implementation guide:

✅ Pre-Deployment Checklist
  - Cluster requirements (OpenShift Virtualization, SCCs, devices)
  - PVC requirements (SELinux labels, ReadWriteOnce conflicts)
  - Image requirements

✅ Pod Specification Requirements
  - Metadata and labels
  - Owner references for automatic cleanup

✅ Security Configuration (CRITICAL)
  - Pod-level: fsGroup 107, SELinux MCS labels
  - Container-level: privileged mode, qemu user/group
  - SELinux label discovery methods (3 approaches)
  - Why privileged mode is required (/dev/kvm + SELinux)

✅ Volume Mounting
  - /dev/fuse and /dev/kvm hostPath volumes
  - PVC mounting (NO readOnly, use subPath)
  - EmptyDir for filesystem mounts
  - Multiple PVC support

✅ Environment Variables
  - HOME=/tmp (required for libguestfs)
  - Optional mount directory overrides

✅ Pod Lifecycle Management
  - Command: detect-and-mount.sh
  - Resource limits (memory/CPU for libguestfs)
  - Restart policy recommendations

✅ Validation and Verification
  - Pod status checks
  - Log message verification
  - Filesystem accessibility tests

✅ Troubleshooting
  - Common issues and solutions
  - Permission denied errors
  - Performance issues
  - SELinux problems

✅ Complete Example Pod Spec
  - Tested and verified in live cluster
  - Ready to use as template

✅ Controller Implementation Checklist
  - Phase 1: Pre-creation validation
  - Phase 2: Pod creation
  - Phase 3: Status monitoring
  - Phase 4: Cleanup

✅ Testing Procedures
  - Manual test steps
  - Expected outputs
  - Verification commands

This guide provides everything needed to implement Issue migtools#7 (VMFR controller)
based on successful live cluster testing from Issue migtools#6.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Subtask] Design and create container with potential plugins as a tools to access restored file systems

1 participant