Shared infrastructure and utilities used by all METAINFORMANT domain modules. The core package provides battle-tested components for I/O, configuration, logging, parallel execution, caching, database connectivity, and workflow orchestration.
- Getting Started — 5-minute tutorial with complete pipeline example
- Architecture — System design, component interactions, and principles
| Component | Description | Documentation |
|---|---|---|
| I/O Operations | File I/O, JSON/CSV/TSV/YAML, downloads, atomic writes | core.io |
| Configuration | Config loading, environment overrides, merging | core.utils.config |
| Path Handling | Path resolution, security, sanitization | core.io.paths |
| Logging | Structured logging, metadata, environment config | core.utils.logging |
| Caching | JSON cache with TTL, thread-safe operations | core.io.cache |
| Download | Robust HTTP/FTP downloads, retry, resume, heartbeat | core.io.download |
| Parallel Execution | Thread/process pools, resource-aware workers | core.execution.parallel |
| Database | PostgreSQL connectivity, connection pooling | core.data.db |
| Hashing | SHA256 file and content hashing | core.utils.hash |
| Text Processing | Text cleaning, slugify, gene name standardization | core.utils.text |
| Workflow | DAG orchestration, config-driven pipelines | core.execution.workflow |
src/metainformant/core/
├── io/ # Input/Output operations
│ ├── io.py # Core file I/O (JSON, CSV, YAML, Parquet)
│ ├── paths.py # Path utilities and security
│ ├── cache.py # JSON caching with TTL
│ ├── download.py # Download with retry/resume/heartbeat
│ ├── atomic.py # Atomic file operations
│ ├── checksums.py # Checksum verification
│ └── disk.py # Disk space management
├── utils/ # Utility functions
│ ├── logging.py # Structured logging
│ ├── config.py # Configuration loader
│ ├── hash.py # SHA256 hashing
│ ├── text.py # Text processing
│ ├── errors.py # Error hierarchy
│ └── timing.py # Performance timing
├── execution/ # Execution engines
│ ├── parallel.py # Parallel execution utilities
│ ├── workflow.py # Workflow orchestration
│ └── discovery.py # Symbol discovery
├── data/ # Data layer
│ ├── db.py # PostgreSQL integration
│ └── validation.py # Validation utilities
├── engine/ # Pipeline engines
│ └── workflow_manager.py
└── ui/ # User interfaces
└── tui.py # Terminal UI
All tests use real implementations. No mock objects. This ensures production reliability.
All file writes use atomic replacement (temp file → rename) to prevent corruption.
- Consistent log format:
TIMESTAMP | LEVEL | MODULE | MESSAGE - Optional structured metadata via
log_with_metadata() - Download heartbeats for progress tracking
- Path traversal prevention (
is_safe_path()) - Filename sanitization
- SQL injection protection (
sanitize_connection_params())
- Pure
pathlib.Path(noos.path) - UTF-8 everywhere
- Minimal external dependencies (optional)
# Standard import pattern
from metainformant.core import io, cache, paths
from metainformant.core.utils import logging, config
# Get logger
logger = logging.get_logger(__name__)
# Ensure directories
output = paths.ensure_directory(Path("output"))
# Load configuration
cfg = config.load_mapping_from_file("config.yaml")
# Download with caching
cached = cache.load_cached_json(cache_dir, "key", ttl_seconds=3600)
if cached is None:
data = io.download_json(url)
cache.cache_json(cache_dir, "key", data)
# Process files
for item in io.read_jsonl("data.jsonl"):
process(item)
logger.info("Pipeline complete")| Variable | Purpose | Default |
|---|---|---|
CORE_LOG_LEVEL |
Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
AK_THREADS |
Override default thread count | CPU-dependent |
AK_WORK_DIR |
Working directory for outputs | output/ |
AK_LOG_DIR |
Directory for log files | logs/ |
PG_HOST |
PostgreSQL host | localhost |
PG_PORT |
PostgreSQL port | 5432 |
PG_DATABASE |
Database name | metainformant |
PG_USER |
Database user | postgres |
PG_PASSWORD |
Database password | (empty) |
Build with:
uv run python scripts/package/uv_docs.shOr manually:
cd docs
sphinx-build -b html . _buildWhen modifying core components:
- Add tests in
tests/test_core_*.py(no mocks!) - Update documentation in
docs/core/*.md - Follow conventions:
pathlib.Path, type hints,get_logger(__name__) - Check AGENTS.md:
src/metainformant/core/AGENTS.mdhas agent-specific rules
- Full API Reference — Type signatures and data structures
- Agent Directives — Documentation agent guidelines
- Examples Directory — Runnable code samples
- Source Code — Implementation details