Skip to content

Add environment variable for configurable model storage path #19

@titusz

Description

@titusz

Problem

The model storage path is hardcoded to use platformdirs.user_data_dir, which is problematic for:

  • Container deployments (ephemeral filesystems)
  • CI/CD environments (need to cache in specific locations)
  • Multi-user systems (shared model cache)
  • Custom deployment scenarios

Current Behavior

from platformdirs import PlatformDirs

dirs = PlatformDirs(appname=APP_NAME, appauthor=APP_AUTHOR, ensure_exists=True)
MODEL_PATH = Path(dirs.user_data_dir) / MODEL_FILENAME

The path cannot be overridden without modifying the source code.

Expected Behavior

Allow users to specify a custom model path via environment variable:

import os
from pathlib import Path
from platformdirs import PlatformDirs

# Allow override via environment variable
model_dir = os.getenv("ISCC_SCT_MODEL_DIR")
if model_dir:
    MODEL_PATH = Path(model_dir) / MODEL_FILENAME
else:
    dirs = PlatformDirs(appname=APP_NAME, appauthor=APP_AUTHOR, ensure_exists=True)
    MODEL_PATH = Path(dirs.user_data_dir) / MODEL_FILENAME

# Ensure directory exists
MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)

Use Cases

Container Deployments

# Download model to /app/models during build
ENV ISCC_SCT_MODEL_DIR=/app/models
RUN python -c "import iscc_sct.utils; iscc_sct.utils.get_model()"

CI/CD Caching

- name: Cache iscc-sct model
  uses: actions/cache@v4
  with:
    path: .cache/iscc-sct
    key: iscc-sct-model-v1

- name: Run tests
  env:
    ISCC_SCT_MODEL_DIR: .cache/iscc-sct
  run: pytest

Shared Model Cache

# Multiple users on the same system share one model
export ISCC_SCT_MODEL_DIR=/opt/shared/iscc-sct
python my_app.py

Benefits

  • Zero breaking changes (environment variable is optional)
  • Follows common practice (HuggingFace uses HF_HOME, PyTorch uses TORCH_HOME, etc.)
  • Enables efficient container deployments
  • Simplifies CI/CD caching strategies
  • Supports shared model caches

Implementation Notes

Also consider supporting ISCC_SCT_TOKENIZER_DIR for the tokenizer model, following the same pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions