Skip to content

Add CLI command to pre-download models #20

@titusz

Description

@titusz

Problem

There's no explicit way to pre-download the models without writing Python code. This makes it difficult to:

  • Pre-download models in container builds
  • Cache models in CI/CD pipelines
  • Ensure models are available before running the application
  • Verify model integrity explicitly

Current Workaround

Users must write custom Python code:

import iscc_sct.code_semantic_text as sct
import iscc_sct.utils as sct_utils

# Trigger download
sct.model()
# or
sct_utils.get_model()

This is not intuitive and requires understanding the internal API.

Expected Behavior

Add a CLI command to explicitly download and verify models:

# Download all models
iscc-sct download-models

# Verify models are present and valid
iscc-sct verify-models

# Show model information
iscc-sct model-info

Example Output

$ iscc-sct download-models
Downloading ONNX model (iscc-sct-v0.1.0.onnx)...
  URL: https://github.com/iscc/iscc-sct/releases/download/v0.1.0/iscc-sct-v0.1.0.onnx
  Size: 435 MB
  Progress: ████████████████████ 100%
  Checksum: ✓ Verified (BLAKE3)
  Location: /home/user/.local/share/iscc-sct/iscc-sct-v0.1.0.onnx

Downloading tokenizer model...
  ✓ Complete

All models downloaded successfully.

$ iscc-sct verify-models
✓ ONNX model: OK
✓ Tokenizer model: OK

$ iscc-sct model-info
ONNX Model:
  Version: v0.1.0
  Path: /home/user/.local/share/iscc-sct/iscc-sct-v0.1.0.onnx
  Size: 435 MB
  Checksum: valid

Tokenizer Model:
  Path: /home/user/.local/share/iscc-sct/tokenizer
  Size: 1.2 MB

Use Cases

Container Builds

# Pre-download models during build
RUN iscc-sct download-models

CI/CD

- name: Setup models
  run: iscc-sct download-models

- name: Verify models
  run: iscc-sct verify-models

User Scripts

#!/bin/bash
# Ensure models are available before starting app
iscc-sct verify-models || iscc-sct download-models
python app.py

Implementation

Add new commands to the existing sct CLI:

import typer
from iscc_sct import utils

app = typer.Typer()

@app.command()
def download_models():
    """Download all required models."""
    typer.echo("Downloading models...")
    utils.get_model()
    # Download tokenizer if needed
    typer.echo("All models downloaded successfully.")

@app.command()
def verify_models():
    """Verify all models are present and valid."""
    try:
        if not utils.MODEL_PATH.exists():
            raise FileNotFoundError("ONNX model not found")
        if not utils.check_integrity(utils.MODEL_PATH, utils.MODEL_CHECKSUM):
            raise ValueError("ONNX model integrity check failed")
        typer.echo("✓ All models verified")
    except Exception as e:
        typer.echo(f"✗ Verification failed: {e}", err=True)
        raise typer.Exit(1)

@app.command()
def model_info():
    """Show information about downloaded models."""
    # Display model paths, sizes, versions, etc.
    pass

Benefits

  • Clear, explicit way to manage models
  • Better user experience
  • Simplifies container and CI/CD workflows
  • Follows CLI best practices
  • Makes model management transparent

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions