Every change to ThinkOps must respect these principles. The enhance loop and all contributors should treat violations as bugs.
The TypeScript orchestrator is thin plumbing — config, spawning, splitting, tracking. All intelligence lives in prompt templates (prompts/*.md). Change what the system does by editing markdown, not code. The orchestrator decides when and how to spawn; prompts decide what to do.
Task splitting, parallelism, and pipeline enforcement happen in the orchestrator — never via prompt engineering. The LLM is never asked "should you run these in parallel?" The orchestrator decides based on structured output from preflight (fast/full flags, priorities).
Every cycle follows an escalating cost ladder:
## Checkcommand — zero LLM cost (SHA-256 fingerprint)- Generated tool — zero LLM cost (bash script from pattern detection)
- Preflight LLM — cheap (bare mode, ~15-20K tokens saved)
- Full execution — expensive (only when real work exists)
Never skip a cheaper layer to jump to an expensive one. New features should add cheap checks, not more LLM calls.
Three feedback loops at different speeds, each catching what the others miss:
- Muscle memory (per-task): Record actions → fingerprint sequences → detect repeats → generate scripts → replace LLM with bash. Self-healing: script fails → fall back to LLM silently.
- Peer review (per-cycle): Critic adversarially challenges claims → eval scores quality → route findings (SKILL/CODE/CRITICAL). The system internalizes patterns from its own reviews.
- Retrospective (daily): Check outcomes after completion (PR merged? reverted?) → extract lessons → load into next preflight. A task that looked good at eval time but failed in production gets remembered.
All three feed into preflight context for the next run. Lessons from one connector inform others.
Failures are classified and each gets exactly one response path:
- Auth errors → backoff, skip recovery (retrying won't help)
- Rate limits → exponential backoff with reset on success
- Execution failures → recovery pipeline (retry → escalate → abandon, max 3 attempts)
- Human needed → release concurrency slot, ask via Telegram, resume
Never conflate failure types. Never double-log the same failure.
No task completes without flowing through critic → eval → learn. The critic checks whether actual work matches claims. Eval scores quality and routes findings. Skipping eval (e.g., due to auth errors) must be logged as a warning, never silently swallowed.
A connector is a repeating concern, not a one-shot task. It's polled adaptively — fast when active (10min), slow when idle (up to 1hr). The system never "finishes" a connector; it keeps checking. Empty or placeholder connectors (< 100 bytes) should be filtered out, not polled.
Every spawn logs to _run_log.md. Every task outcome goes to per-connector audit logs. The dashboard shows real-time state via SSE. gather-state.sh runs diagnostics at zero LLM cost. If something happens and nobody can see it, that's a bug.
Non-critical failures never block progress:
- Critic/eval auth error → log warning, continue
- Skill selection fails → run with empty context
- Generated tool fails → fall back to LLM
- Merge conflict in enhance loop → abort merge, continue cycle
The system should always make forward progress, even in a degraded state.
Changes should be the smallest diff that fixes the issue. No drive-by cleanups, no speculative abstractions, no features beyond what was asked. All changes must pass npm test before commit. The enhance loop reverts on test failure — this is a feature, not a limitation.