Summary
When a remote Codex task fails before/around model execution, e.g.
Task failed: Codex thread entered systemError
Task failed: Selected model is at capacity. Please try a different model.
continuing from the HAPI UI can lose the previous conversation/task context. The next user message is treated like a fresh request and the agent no longer has the earlier instruction/context.
Example observed flow:
- User asks:
现在给我一个完整的表格,关于这些方法的数据
- Codex task fails with capacity/systemError.
- User retries/sends again.
- Agent answers as if it has no previous context:
可以,但我现在还不知道“这些方法”具体指哪些。
Expected behavior
After a task fails due to a transient model/provider error, HAPI should preserve enough session/thread state so a retry or follow-up continues the same Codex thread/context, or at least offers an explicit “resume/retry with previous context” path.
Actual behavior
The retry/follow-up appears to lose the prior task context and behaves like a new empty interaction.
Relevant code/data points
I traced the current runner/session path and noticed most of the active session tracking is in runner memory only:
| Area |
File |
Data |
Persistence |
Risk |
| runner local state |
cli/src/persistence.ts |
pid, httpPort, version, startedWithApiUrl, startedWithMachineId, token hash, heartbeat |
~/.hapi/runner.state.json |
only runner process metadata, not full session/thread state |
| active sessions |
cli/src/runner/run.ts |
pidToTrackedSession |
memory only |
lost on runner restart/crash |
| spawn awaiters |
cli/src/runner/run.ts |
pidToAwaiter, pidToErrorAwaiter |
memory only |
lost on failure/restart |
| session webhook |
cli/src/runner/run.ts |
happySessionId, metadata, PID |
memory map update |
not enough if not persisted |
| resume path |
cli/src/runner/run.ts |
resumeSessionId passed to agent command |
request only |
depends on caller preserving/providing resume id |
| Codex command |
cli/src/commands/codex.ts |
parses hapi codex resume <id> |
no |
ok as CLI path, but HAPI retry may not re-use it correctly |
| Codex session |
cli/src/codex/loop.ts, cli/src/codex/session.ts |
resumeSessionId ?? null as sessionId |
no |
session can resume only if the id is still known |
The runner README also documents that runner.state.json only stores runner process state, not session mapping.
Possible fix direction
A robust fix may be one or more of:
- Persist session/thread mapping (
happySessionId, provider-specific session/thread id such as Codex resume id, PID, cwd, agent, model settings) beyond in-memory pidToTrackedSession.
- On task failure (
systemError, capacity, provider transient errors), keep the same Codex thread/session id and expose retry against that same id.
- Make the HAPI UI/backend send
resumeSessionId when retrying a failed Codex task if a prior session/thread id exists.
- Add diagnostics/logging around failed Codex tasks to show whether a resume id was available and whether it was used.
Environment
- HAPI CLI version observed:
0.17.1
- Agent: Codex
- Start path observed in logs:
hapi codex resume <resumeSessionId> --hapi-starting-mode remote --started-by runner ...
- Runner keeps process metadata in
~/.hapi/runner.state.json; active session maps are in-memory.
Summary
When a remote Codex task fails before/around model execution, e.g.
Task failed: Codex thread entered systemErrorTask failed: Selected model is at capacity. Please try a different model.continuing from the HAPI UI can lose the previous conversation/task context. The next user message is treated like a fresh request and the agent no longer has the earlier instruction/context.
Example observed flow:
现在给我一个完整的表格,关于这些方法的数据可以,但我现在还不知道“这些方法”具体指哪些。Expected behavior
After a task fails due to a transient model/provider error, HAPI should preserve enough session/thread state so a retry or follow-up continues the same Codex thread/context, or at least offers an explicit “resume/retry with previous context” path.
Actual behavior
The retry/follow-up appears to lose the prior task context and behaves like a new empty interaction.
Relevant code/data points
I traced the current runner/session path and noticed most of the active session tracking is in runner memory only:
cli/src/persistence.tspid,httpPort, version,startedWithApiUrl,startedWithMachineId, token hash, heartbeat~/.hapi/runner.state.jsoncli/src/runner/run.tspidToTrackedSessioncli/src/runner/run.tspidToAwaiter,pidToErrorAwaitercli/src/runner/run.tshappySessionId, metadata, PIDcli/src/runner/run.tsresumeSessionIdpassed to agent commandcli/src/commands/codex.tshapi codex resume <id>cli/src/codex/loop.ts,cli/src/codex/session.tsresumeSessionId ?? nullassessionIdThe runner README also documents that
runner.state.jsononly stores runner process state, not session mapping.Possible fix direction
A robust fix may be one or more of:
happySessionId, provider-specific session/thread id such as Codex resume id, PID, cwd, agent, model settings) beyond in-memorypidToTrackedSession.systemError, capacity, provider transient errors), keep the same Codex thread/session id and expose retry against that same id.resumeSessionIdwhen retrying a failed Codex task if a prior session/thread id exists.Environment
0.17.1hapi codex resume <resumeSessionId> --hapi-starting-mode remote --started-by runner ...~/.hapi/runner.state.json; active session maps are in-memory.