Problem
The model storage path is hardcoded to use platformdirs.user_data_dir, which is problematic for:
- Container deployments (ephemeral filesystems)
- CI/CD environments (need to cache in specific locations)
- Multi-user systems (shared model cache)
- Custom deployment scenarios
Current Behavior
from platformdirs import PlatformDirs
dirs = PlatformDirs(appname=APP_NAME, appauthor=APP_AUTHOR, ensure_exists=True)
MODEL_PATH = Path(dirs.user_data_dir) / MODEL_FILENAME
The path cannot be overridden without modifying the source code.
Expected Behavior
Allow users to specify a custom model path via environment variable:
import os
from pathlib import Path
from platformdirs import PlatformDirs
# Allow override via environment variable
model_dir = os.getenv("ISCC_SCT_MODEL_DIR")
if model_dir:
MODEL_PATH = Path(model_dir) / MODEL_FILENAME
else:
dirs = PlatformDirs(appname=APP_NAME, appauthor=APP_AUTHOR, ensure_exists=True)
MODEL_PATH = Path(dirs.user_data_dir) / MODEL_FILENAME
# Ensure directory exists
MODEL_PATH.parent.mkdir(parents=True, exist_ok=True)
Use Cases
Container Deployments
# Download model to /app/models during build
ENV ISCC_SCT_MODEL_DIR=/app/models
RUN python -c "import iscc_sct.utils; iscc_sct.utils.get_model()"
CI/CD Caching
- name: Cache iscc-sct model
uses: actions/cache@v4
with:
path: .cache/iscc-sct
key: iscc-sct-model-v1
- name: Run tests
env:
ISCC_SCT_MODEL_DIR: .cache/iscc-sct
run: pytest
Shared Model Cache
# Multiple users on the same system share one model
export ISCC_SCT_MODEL_DIR=/opt/shared/iscc-sct
python my_app.py
Benefits
- Zero breaking changes (environment variable is optional)
- Follows common practice (HuggingFace uses
HF_HOME, PyTorch uses TORCH_HOME, etc.)
- Enables efficient container deployments
- Simplifies CI/CD caching strategies
- Supports shared model caches
Implementation Notes
Also consider supporting ISCC_SCT_TOKENIZER_DIR for the tokenizer model, following the same pattern.
Problem
The model storage path is hardcoded to use
platformdirs.user_data_dir, which is problematic for:Current Behavior
The path cannot be overridden without modifying the source code.
Expected Behavior
Allow users to specify a custom model path via environment variable:
Use Cases
Container Deployments
CI/CD Caching
Shared Model Cache
Benefits
HF_HOME, PyTorch usesTORCH_HOME, etc.)Implementation Notes
Also consider supporting
ISCC_SCT_TOKENIZER_DIRfor the tokenizer model, following the same pattern.