Database Storage Setup Guide

Complete guide to deploying and configuring database storage for Konveyor IQ.

Prerequisites

Python 3.9+
Git (for version tracking in database)
One of:
- SQLite (built into Python - no install needed)
- PostgreSQL 12+ (for team/production use)

Installation Steps

1. Install Python Dependencies

Option A: SQLite (Recommended for Personal Use)

# Install SQLAlchemy (only additional dependency needed)
pip install sqlalchemy>=2.0.0

# Or install from storage requirements file
pip install -r storage/requirements.txt

Total dependencies added: 1 package (sqlalchemy)

Option B: PostgreSQL (Team/Production Use)

# Install SQLAlchemy + PostgreSQL adapter
pip install sqlalchemy>=2.0.0 psycopg2-binary>=2.9.0

# Or uncomment PostgreSQL line in storage/requirements.txt and run:
pip install -r storage/requirements.txt

Total dependencies added: 2 packages (sqlalchemy, psycopg2-binary)

2. Update Main Requirements (Optional)

If you want database dependencies available by default, update the main requirements.txt:

# Add to requirements.txt
echo "sqlalchemy>=2.0.0  # Database storage backend" >> requirements.txt

# For PostgreSQL support
echo "psycopg2-binary>=2.9.0  # PostgreSQL adapter (optional)" >> requirements.txt

3. Configure Storage Backend

Edit config.yaml:

For SQLite (Local Development)

storage:
  type: "sqlite"
  path: "konveyor_iq.db"

reporting:
  write_to_database: true  # Enable DB writes

For PostgreSQL (Production)

storage:
  type: "postgresql"
  connection_string: "${DATABASE_URL}"

reporting:
  write_to_database: true

Set environment variable:

export DATABASE_URL="postgresql://user:password@localhost:5432/konveyor_iq"

4. Initialize Database Schema

python db_cli.py init --config config.yaml

Expected output:

Initializing database: sqlite
✓ Database initialized successfully

5. Verify Installation

# Check that tables were created (SQLite)
sqlite3 konveyor_iq.db ".tables"

# Expected output:
# cost_tracking            performance_alerts
# evaluation_runs          rule_performance_summary
# rules                    run_models
# test_results

# Check that CLI works
python db_cli.py query runs --limit 5

PostgreSQL Setup (Detailed)

Local Development

macOS (Homebrew)

# Install PostgreSQL
brew install postgresql@15
brew services start postgresql@15

# Create database
createdb konveyor_iq

# Create user (optional)
psql postgres -c "CREATE USER konveyor WITH PASSWORD 'yourpassword';"
psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE konveyor_iq TO konveyor;"

# Set connection string
export DATABASE_URL="postgresql://konveyor:yourpassword@localhost:5432/konveyor_iq"

# Initialize schema
python db_cli.py init --config config.yaml

Ubuntu/Debian

# Install PostgreSQL
sudo apt-get update
sudo apt-get install postgresql postgresql-contrib

# Start service
sudo systemctl start postgresql
sudo systemctl enable postgresql

# Create database and user
sudo -u postgres psql -c "CREATE DATABASE konveyor_iq;"
sudo -u postgres psql -c "CREATE USER konveyor WITH PASSWORD 'yourpassword';"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE konveyor_iq TO konveyor;"

# Set connection string
export DATABASE_URL="postgresql://konveyor:yourpassword@localhost:5432/konveyor_iq"

# Initialize schema
python db_cli.py init --config config.yaml

Docker

# Run PostgreSQL container
docker run -d \
  --name konveyor-iq-db \
  -e POSTGRES_DB=konveyor_iq \
  -e POSTGRES_USER=konveyor \
  -e POSTGRES_PASSWORD=yourpassword \
  -p 5432:5432 \
  -v konveyor_iq_data:/var/lib/postgresql/data \
  postgres:15

# Wait for container to start
sleep 5

# Set connection string
export DATABASE_URL="postgresql://konveyor:yourpassword@localhost:5432/konveyor_iq"

# Initialize schema
python db_cli.py init --config config.yaml

Production Deployment

AWS RDS

Create RDS PostgreSQL Instance
- Engine: PostgreSQL 15
- Instance class: db.t3.micro (start small)
- Storage: 20GB GP3
- Enable automated backups
- VPC: Configure security groups

Get Connection Details

Endpoint: konveyor-iq.xxxxx.us-east-1.rds.amazonaws.com
Port: 5432
Database: konveyor_iq

Set Connection String

export DATABASE_URL="postgresql://admin:password@konveyor-iq.xxxxx.us-east-1.rds.amazonaws.com:5432/konveyor_iq"

Initialize Schema

python db_cli.py init --config config.yaml

Google Cloud SQL

Create Cloud SQL Instance

gcloud sql instances create konveyor-iq \
  --database-version=POSTGRES_15 \
  --tier=db-f1-micro \
  --region=us-central1

Create Database

gcloud sql databases create konveyor_iq --instance=konveyor-iq

Create User

gcloud sql users create konveyor \
  --instance=konveyor-iq \
  --password=yourpassword

Get Connection Name

gcloud sql instances describe konveyor-iq --format="value(connectionName)"
# Output: project-id:region:instance-name

Connect via Cloud SQL Proxy

cloud_sql_proxy -instances=project-id:region:konveyor-iq=tcp:5432

Set Connection String

export DATABASE_URL="postgresql://konveyor:yourpassword@localhost:5432/konveyor_iq"

Azure Database for PostgreSQL

Create PostgreSQL Server

az postgres server create \
  --resource-group konveyor-rg \
  --name konveyor-iq-db \
  --location eastus \
  --admin-user konveyor \
  --admin-password yourpassword \
  --sku-name B_Gen5_1

Configure Firewall

az postgres server firewall-rule create \
  --resource-group konveyor-rg \
  --server konveyor-iq-db \
  --name AllowMyIP \
  --start-ip-address YOUR_IP \
  --end-ip-address YOUR_IP

Create Database

az postgres db create \
  --resource-group konveyor-rg \
  --server-name konveyor-iq-db \
  --name konveyor_iq

Set Connection String

export DATABASE_URL="postgresql://konveyor@konveyor-iq-db:yourpassword@konveyor-iq-db.postgres.database.azure.com:5432/konveyor_iq?sslmode=require"

Deployment Checklist

Pre-Deployment

Install Python dependencies (sqlalchemy, psycopg2-binary if PostgreSQL)
Choose storage backend (SQLite vs PostgreSQL)
Set up database (create DB, configure credentials)
Update config.yaml with storage settings
Set environment variables (DATABASE_URL if PostgreSQL)
Test connection: python db_cli.py init --config config.yaml

Post-Deployment

Run test evaluation to verify database writes work
Query recent runs: python db_cli.py query runs
Set up backups (automated for cloud DBs)
Configure monitoring/alerting (optional)
Document connection details for team

Configuration Examples

Development (SQLite)

# config.yaml
storage:
  type: "sqlite"
  path: "konveyor_iq.db"

reporting:
  formats:
    - html
    - markdown
  output_dir: "results/"
  write_to_database: true  # Write to both files AND database

Staging (PostgreSQL - Single Server)

# config.yaml
storage:
  type: "postgresql"
  connection_string: "${DATABASE_URL}"

reporting:
  formats:
    - html
  output_dir: "results/"
  write_to_database: true

# .env file (for staging server)
DATABASE_URL=postgresql://konveyor:stagingpass@staging-db.internal:5432/konveyor_iq

Production (PostgreSQL - Multi-Region)

# config.yaml
storage:
  type: "postgresql"
  connection_string: "${DATABASE_URL}"

reporting:
  formats:
    - html
  output_dir: "s3://konveyor-iq-results/"  # Optional: Store HTML reports in S3
  write_to_database: true

# .env file (production)
DATABASE_URL=postgresql://konveyor:prod_secret@prod-db.us-east-1.rds.amazonaws.com:5432/konveyor_iq?sslmode=require

CI/CD Integration

GitHub Actions

# .github/workflows/evaluate.yml
name: Konveyor IQ Evaluation

on:
  push:
    branches: [main]
  schedule:
    - cron: '0 0 * * *'  # Daily at midnight

jobs:
  evaluate:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_DB: konveyor_iq
          POSTGRES_USER: konveyor
          POSTGRES_PASSWORD: test_password
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
        ports:
          - 5432:5432

    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install -r storage/requirements.txt

      - name: Initialize database
        env:
          DATABASE_URL: postgresql://konveyor:test_password@localhost:5432/konveyor_iq
        run: |
          python db_cli.py init --config config.yaml

      - name: Run evaluation
        env:
          DATABASE_URL: postgresql://konveyor:test_password@localhost:5432/konveyor_iq
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          python evaluate.py --benchmark benchmarks/test_cases/generated/quarkus.yaml

      - name: Check for regressions
        env:
          DATABASE_URL: postgresql://konveyor:test_password@localhost:5432/konveyor_iq
        run: |
          python db_cli.py query regressions --threshold 10
          # Exit code will be non-zero if regressions found

GitLab CI

# .gitlab-ci.yml
stages:
  - evaluate
  - analyze

variables:
  DATABASE_URL: "postgresql://konveyor:password@postgres:5432/konveyor_iq"

services:
  - postgres:15

before_script:
  - pip install -r requirements.txt
  - pip install -r storage/requirements.txt
  - python db_cli.py init --config config.yaml

evaluate:
  stage: evaluate
  script:
    - python evaluate.py --benchmark benchmarks/test_cases/generated/quarkus.yaml
  artifacts:
    paths:
      - results/

analyze:
  stage: analyze
  script:
    - python db_cli.py query models --days 7
    - python db_cli.py query regressions --threshold 10

Backup and Recovery

SQLite Backups

# Backup
cp konveyor_iq.db konveyor_iq.db.backup-$(date +%Y%m%d)

# Or use SQLite backup command
sqlite3 konveyor_iq.db ".backup 'konveyor_iq.db.backup'"

# Restore
cp konveyor_iq.db.backup konveyor_iq.db

# Automated daily backups
echo "0 2 * * * cp /path/to/konveyor_iq.db /backups/konveyor_iq.db.\$(date +\%Y\%m\%d)" | crontab -

PostgreSQL Backups

# Backup (dump)
pg_dump -h localhost -U konveyor konveyor_iq > backup-$(date +%Y%m%d).sql

# Restore
psql -h localhost -U konveyor konveyor_iq < backup-20250125.sql

# Automated backups (cron)
echo "0 2 * * * pg_dump -h localhost -U konveyor konveyor_iq > /backups/konveyor_iq-\$(date +\%Y\%m\%d).sql" | crontab -

Troubleshooting

SQLite Issues

Error: "database is locked"

# SQLite doesn't handle concurrent writes well
# Solution: Use PostgreSQL for CI/CD, or serialize evaluations

Error: "unable to open database file"

# Check file permissions
ls -la konveyor_iq.db

# Fix permissions
chmod 644 konveyor_iq.db

PostgreSQL Issues

Error: "FATAL: role 'konveyor' does not exist"

# Create user
psql postgres -c "CREATE USER konveyor WITH PASSWORD 'yourpassword';"

Error: "FATAL: database 'konveyor_iq' does not exist"

# Create database
createdb konveyor_iq
# Or
psql postgres -c "CREATE DATABASE konveyor_iq;"

Error: "connection refused"

# Check PostgreSQL is running
pg_isready -h localhost -p 5432

# Start PostgreSQL
# macOS:
brew services start postgresql@15
# Linux:
sudo systemctl start postgresql

Error: "could not connect to server: Connection refused"

# Check connection string
echo $DATABASE_URL

# Verify PostgreSQL is listening
netstat -an | grep 5432

Dependency Issues

Error: "No module named 'sqlalchemy'"

pip install sqlalchemy>=2.0.0

Error: "No module named 'psycopg2'"

pip install psycopg2-binary>=2.9.0

Monitoring

Basic Health Check

# Check database connectivity
python -c "from storage import get_storage; storage = get_storage({'type': 'sqlite', 'path': 'konveyor_iq.db'}); print('✓ Connected'); storage.close()"

# Check recent activity
python db_cli.py query runs --limit 5

PostgreSQL Monitoring

-- Active connections
SELECT count(*) FROM pg_stat_activity WHERE datname = 'konveyor_iq';

-- Database size
SELECT pg_size_pretty(pg_database_size('konveyor_iq'));

-- Table sizes
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

-- Slow queries (if logging enabled)
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Performance Tuning

PostgreSQL Configuration

# /etc/postgresql/15/main/postgresql.conf

# Memory settings
shared_buffers = 256MB
effective_cache_size = 1GB
work_mem = 16MB

# Connection settings
max_connections = 100

# Query performance
random_page_cost = 1.1  # For SSD
effective_io_concurrency = 200

# Write performance
checkpoint_completion_target = 0.9
wal_buffers = 16MB

Index Optimization

-- Check index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan as index_scans
FROM pg_stat_user_indexes
ORDER BY idx_scan;

-- Rebuild indexes if needed
REINDEX TABLE test_results;

Migration from File Storage

If you have existing file-based results and want to import them:

# import_historical_data.py
import json
import glob
from pathlib import Path
from storage.writer import DatabaseWriter

config = {'type': 'sqlite', 'path': 'konveyor_iq.db'}

with DatabaseWriter(config) as writer:
    # Find all result files
    for run_dir in Path('results/').iterdir():
        if not run_dir.is_dir():
            continue

        # Read run metadata
        run_file = run_dir / 'run.json'
        if not run_file.exists():
            continue

        with open(run_file) as f:
            run_data = json.load(f)

        # Create run in DB
        run_id = writer.start_run(**run_data)

        # Import results
        results_file = run_dir / 'results' / 'test_results.jsonl'
        if results_file.exists():
            with open(results_file) as f:
                for line in f:
                    result = json.loads(line)
                    writer.write_result(run_id, result)

        writer.complete_run(run_id)

print("Historical data imported successfully!")

Summary

Minimum Setup (SQLite):

pip install sqlalchemy
Update config.yaml with SQLite settings
python db_cli.py init
Run evaluations as normal

Production Setup (PostgreSQL):

pip install sqlalchemy psycopg2-binary
Set up PostgreSQL (local/cloud)
Update config.yaml with connection string
python db_cli.py init
Configure backups and monitoring

Total setup time: 5-10 minutes for SQLite, 30-60 minutes for PostgreSQL (including cloud setup).

FilesExpand file tree

database_setup_guide.md

Latest commit

History

database_setup_guide.md

File metadata and controls

Database Storage Setup Guide

Prerequisites

Installation Steps

1. Install Python Dependencies

Option A: SQLite (Recommended for Personal Use)

Option B: PostgreSQL (Team/Production Use)

2. Update Main Requirements (Optional)

3. Configure Storage Backend

For SQLite (Local Development)

For PostgreSQL (Production)

4. Initialize Database Schema

5. Verify Installation

PostgreSQL Setup (Detailed)

Local Development

macOS (Homebrew)

Ubuntu/Debian

Docker

Production Deployment

AWS RDS

Google Cloud SQL

Azure Database for PostgreSQL

Deployment Checklist

Pre-Deployment

Post-Deployment

Configuration Examples

Development (SQLite)

Staging (PostgreSQL - Single Server)

Production (PostgreSQL - Multi-Region)

CI/CD Integration

GitHub Actions

GitLab CI

Backup and Recovery

SQLite Backups

PostgreSQL Backups

Troubleshooting

SQLite Issues

PostgreSQL Issues

Dependency Issues

Monitoring

Basic Health Check

PostgreSQL Monitoring

Performance Tuning

PostgreSQL Configuration

Index Optimization

Migration from File Storage

Summary