Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@ Format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

---

## [Unreleased]

### Fixed
- `ecluse down` in tmux mode now kills the entire pane process group, not just the pane's foreground shell. Previously, multi-level child chains (`sh → pnpm → node → vite`, plus anything that calls `setsid()` like Cloudflare workerd) survived as orphans adopted by `launchd`/`init`, holding their ports indefinitely. Each orphan held 4-8 ports; after a few `up`/`down` cycles the next `ecluse up` would silently land on a port already held by a zombie, serving a different worktree's content. The same TERM→KILL grace pattern that was applied to the nohup path in PR #18 now applies to tmux. (#30)
- `ecluse flush` now sweeps every process whose cwd is inside a worktree (`lsof +d <worktree>`) AND every listener on a configured port (`base_port + slot*slot_stride` and `extra_ports[].base_port + slot*slot_stride` across all `max_slots`), killing each with TERM→KILL grace. The flush confirmation prompt warns that editors/shells with files open in worktrees will be killed; `--yes` bypass for CI is unchanged. (#30)
- `ecluse status` detects when the configured port is being served by a different process than the recorded PID (or any of its descendants). The service is flagged `✗ wrong owner (PID N)` instead of being silently reported as healthy. `--json` output gains `listener_pid` and `wrong_owner` fields. Exit code semantics unchanged: a wrong-owner row trips the existing `exit 1` path. (#30)

---

## [0.3.1] — 2026-06-15

### Fixed
Expand Down
4 changes: 4 additions & 0 deletions docs/src/limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ ecluse down feat-foo --keep-worktree
ecluse up feat-foo --reuse-worktree
```

Teardown kills the whole process group of each spawned service, with a TERM→KILL grace (2s) — wrapper chains like `sh → pnpm → node → vite` are killed in their entirety, not just the outermost wrapper. This applies to both `tmux` and `nohup` process managers as of 0.3.2. Services that explicitly `setsid()` themselves out of the process group (rare) escape this and require `ecluse flush`, which additionally sweeps every process whose cwd is inside a worktree and every listener on a configured port.

`ecluse status` flags ports where a different process is bound than the one ecluse recorded — `✗ wrong owner (PID N)` instead of `✓ up`. This catches stale orphans hijacking a session's port even though the session's own recorded PID is still alive.

## `command` requires the app to read its port from the environment

ecluse injects the full `.env.ecluse` contents — `PORT`, `ECLUSE_SLOT`, `ECLUSE_SLUG`, `ECLUSE_MODE`, all `ECLUSE_<NAME>_PORT` vars, and any `port_env` aliases — directly into the environment of the spawned process. There is no separate sourcing step; the same map written to `.env.ecluse` is passed to the child process before exec. This only fails if the app ignores the environment entirely:
Expand Down
27 changes: 27 additions & 0 deletions skills/ecluse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,33 @@ ecluse status <your-slug>
3. **Never run `lsof -ti TCP:<port> | xargs kill` blind** — see "Killing services safely". Use `ecluse whose-pid` to verify ownership before any manual kill.
4. **Consider `slot_stride = 10` in `.ecluse.toml`** for visually distinct adjacent-slot ports (3010, 3020, 3030 instead of 3001, 3002, 3003). Doesn't prevent the root cause but makes mistakes harder.

### Wrong content served on the configured URL after multiple up/down cycles

**Symptom:** the user navigates to `http://localhost:7301` expecting slot 1, but sees slot 4's branch instead. `ecluse status` reports the slot 1 service as healthy. Restarting only the affected session doesn't fix it — the wrong content keeps appearing on the configured port.

**Root cause (fixed in 0.3.2+):** an orphan from a previous session is holding the port. Common cause: pnpm/npm wrapper chains where the actual server is a grandchild (`sh → pnpm → node → vite`) — under 0.3.1 and earlier, `ecluse down` killed only the outer wrapper and the actual server reparented to `launchd`/`init`, surviving indefinitely and holding 4-8 ports each. After several `up`/`down` cycles these orphans accumulated and silently collided with new sessions.

**Detection:** `ecluse status` in 0.3.2+ flags this directly:

```
SERVICE TYPE PORT STATUS WINDOW
backoffice native 7301 ✗ wrong owner (PID 81906) backoffice
```

The `wrong owner` row means: the stored PID (or its descendants) is NOT what's currently listening on 7301 — something else is. JSON output gains `listener_pid` and `wrong_owner` fields. Exit code is 1 (same as `✗ down`).

**Recovery on any version:**

```bash
ecluse whose-pid <listener-pid> # confirm it's an orphan, not another session
# If unowned by any ecluse session:
kill -- -<listener-pid> # kill the whole process group (the `-` prefix)
# OR, the recovery hammer (kills everything in worktrees + every configured port):
ecluse flush --yes
```

**Prevention:** upgrade to 0.3.2+. The tmux teardown path now kills the whole process group (TERM→KILL grace), matching what the nohup path already did. `ecluse flush` also sweeps both the worktree cwd and every configured port to clean up orphans that escaped a previous version's teardown.

### Docker not running

```bash
Expand Down
200 changes: 190 additions & 10 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -437,6 +437,65 @@ mod tests {
// Single char after sanitization → invalid slug
assert!(sanitize_to_slug("a").is_err());
}

// ── status_str ────────────────────────────────────────────────────────────

fn svc_status(
managed: bool,
healthy: bool,
wrong_owner: bool,
listener_pid: Option<u32>,
) -> ServiceStatus {
ServiceStatus {
name: "api".into(),
kind: "native",
port: Some(3001),
healthy,
managed,
pid: Some(42),
tmux_window: None,
listener_pid,
wrong_owner,
}
}

#[test]
fn status_str_unmanaged_shows_dash() {
let s = svc_status(false, false, false, None);
assert_eq!(status_str(&s), "\u{2014}");
}

#[test]
fn status_str_healthy_managed_shows_up() {
let s = svc_status(true, true, false, None);
assert_eq!(status_str(&s), "\u{2713} up");
}

#[test]
fn status_str_unhealthy_managed_shows_down() {
let s = svc_status(true, false, false, None);
assert_eq!(status_str(&s), "\u{2717} down");
}

#[test]
fn status_str_wrong_owner_with_listener_pid_shows_pid() {
let s = svc_status(true, false, true, Some(99999));
assert_eq!(status_str(&s), "\u{2717} wrong owner (PID 99999)");
}

#[test]
fn status_str_wrong_owner_without_listener_pid() {
let s = svc_status(true, false, true, None);
assert_eq!(status_str(&s), "\u{2717} wrong owner");
}

#[test]
fn status_str_wrong_owner_takes_precedence_over_healthy() {
// A service can simultaneously have its stored PID alive AND a
// different process bound to its port. `wrong_owner` wins.
let s = svc_status(true, true, true, Some(99999));
assert_eq!(status_str(&s), "\u{2717} wrong owner (PID 99999)");
}
}

/// Sanitize a branch name or slug into a valid ecluse slug + original branch pair.
Expand Down Expand Up @@ -2001,6 +2060,9 @@ fn cmd_flush(args: cli::FlushArgs) -> Result<()> {
if !args.yes {
print!(
"This will destroy all ecluse sessions, worktrees, and running services.\n\
It will also KILL every process with a file open inside the worktrees \
(including editors, shells, and `tail -f` against worktree logs) and \
every process listening on a configured port.\n\
There is no undo. Continue? [y/N] "
);
std::io::stdout().flush()?;
Expand Down Expand Up @@ -2071,6 +2133,72 @@ fn cmd_flush(args: cli::FlushArgs) -> Result<()> {
}
}

// Step 3a: sweep every process whose cwd is inside a worktree. Step 1
// killed services tracked in state.json; this catches detached descendants
// (workerd, vite plugins that setsid()) and processes that crashed out of
// a recorded session. Runs BEFORE worktree removal so git worktree remove
// doesn't race a live process holding file handles.
let worktree_dir_path = root.join(&config.worktree_dir);
if worktree_dir_path.exists() {
log.step("Sweeping stray processes with cwd in worktrees...");
if let Ok(entries) = std::fs::read_dir(&worktree_dir_path) {
for entry in entries.flatten() {
let path = entry.path();
if !path.is_dir() {
continue;
}
for pid in sync::pids_in_directory(&path) {
// Skip our own pid — flush runs from inside the repo and
// would otherwise commit suicide on the first sweep.
if pid == std::process::id() {
continue;
}
log.detail(&format!(
" kill -TERM -- -{} (cwd {})",
pid,
path.display()
));
process::kill_process_group_with_grace(pid);
}
}
}
}

// Step 3b: sweep every listener on any port the config can allocate. This
// catches orphans that no longer have an open file inside the worktree
// (e.g. a daemonized process that chdir'd to /) but are still holding a
// port from the configured range.
log.step("Sweeping listeners on configured ports...");
let mut swept_listener_pids: std::collections::HashSet<u32> = Default::default();
for svc in &config.services {
for slot in 1..=config.max_slots {
// Primary port (covers host_port override).
let primary = svc.port(slot, config.slot_stride);
// Extra ports (debugger sockets, secondary listeners).
let extras: Vec<u16> = svc
.extra_ports
.iter()
.map(|ep| {
ep.base_port.saturating_add(
(slot as u16).saturating_mul(config.slot_stride.max(1) as u16),
)
})
.collect();
for port in std::iter::once(primary).chain(extras) {
if let Some(pid) = validate::port_listener(port) {
if pid == 0 || pid == std::process::id() {
continue;
}
if !swept_listener_pids.insert(pid) {
continue;
}
log.detail(&format!(" kill -TERM -- -{} (port {})", pid, port));
process::kill_process_group_with_grace(pid);
}
}
}
}

// Step 4: remove all worktrees under worktree_dir.
let worktree_dir = root.join(&config.worktree_dir);
if worktree_dir.exists() {
Expand Down Expand Up @@ -2130,6 +2258,35 @@ struct ServiceStatus {
managed: bool,
pid: Option<u32>,
tmux_window: Option<String>,
/// PID of whatever process is actually listening on `port`, if any.
/// Only populated for native services; docker port mappings are owned by
/// the daemon, not the container process, so the check doesn't apply.
listener_pid: Option<u32>,
/// True iff a listener is bound to `port` AND that listener is neither
/// `pid` nor a descendant of it. A stale orphan from a previous session
/// hijacking the port — `ecluse status` reports the service as down
/// even though something IS responding to requests.
wrong_owner: bool,
}

/// Human-readable status string for a service row. Extracted from cmd_status
/// so the wrong-owner branch can be unit-tested.
fn status_str(s: &ServiceStatus) -> String {
if !s.managed {
"\u{2014}".into() // — port-only, not ecluse-managed
} else if s.wrong_owner {
// A different process owns the configured port — likely an orphan from
// a previous session. The service is "down" from ecluse's perspective
// even if something IS responding.
match s.listener_pid {
Some(pid) => format!("\u{2717} wrong owner (PID {})", pid),
None => "\u{2717} wrong owner".into(),
}
} else if s.healthy {
"\u{2713} up".into()
} else {
"\u{2717} down".into()
}
}

#[derive(Tabled)]
Expand Down Expand Up @@ -2246,14 +2403,42 @@ fn cmd_status(args: cli::StatusArgs) -> Result<()> {
// Port-allocation-only services (no command) are never spawned by
// ecluse — don't report them as down.
let managed = svc.command.is_some();

// Listener identity check: if SOME process is bound to the expected
// port and it's neither this service's recorded PID nor a descendant
// of it, the port is being served by an orphan from a previous
// session (or unrelated software). Surface this rather than silently
// reporting healthy=true — the service is technically alive but the
// user is hitting the wrong process.
let (listener_pid, wrong_owner) = if managed {
match (port, pid) {
(Some(p), Some(stored)) => match validate::port_listener(p) {
Some(actual)
if actual != stored
&& actual != 0
&& !whose_pid::is_descendant(stored, actual) =>
{
(Some(actual), true)
}
other => (other, false),
},
_ => (None, false),
}
} else {
(None, false)
};
let healthy_with_owner_check = healthy && !wrong_owner;

statuses.push(ServiceStatus {
name: svc.name.clone(),
kind: "native",
port,
healthy: healthy || !managed,
healthy: healthy_with_owner_check || !managed,
managed,
pid,
tmux_window,
listener_pid,
wrong_owner,
});
}

Expand All @@ -2272,6 +2457,8 @@ fn cmd_status(args: cli::StatusArgs) -> Result<()> {
managed: true,
pid: None,
tmux_window: None,
listener_pid: None,
wrong_owner: false,
});
}

Expand All @@ -2289,6 +2476,8 @@ fn cmd_status(args: cli::StatusArgs) -> Result<()> {
"managed": s.managed,
"pid": s.pid,
"tmux_window": s.tmux_window,
"listener_pid": s.listener_pid,
"wrong_owner": s.wrong_owner,
})
})
.collect();
Expand Down Expand Up @@ -2318,15 +2507,6 @@ fn cmd_status(args: cli::StatusArgs) -> Result<()> {
if statuses.is_empty() {
println!("No services defined in .ecluse.toml.");
} else {
let status_str = |s: &ServiceStatus| -> String {
if !s.managed {
"\u{2014}".into() // — port-only, not ecluse-managed
} else if s.healthy {
"\u{2713} up".into()
} else {
"\u{2717} down".into()
}
};
let port_str = |s: &ServiceStatus| -> String {
s.port.map(|p| p.to_string()).unwrap_or_else(|| "-".into())
};
Expand Down
Loading
Loading