FlowGuard

A model-first guardrail for AI coding agents before small edits become large-project maintenance debt.

Public release	Schema	Runtime	License
`v0.43.0`	`1.0`	Python standard library only	MIT

English lead content comes first; a Chinese mirror follows below.

Why FlowGuard Exists

AI coding agents are useful because they can move quickly through small edits. The failure mode appears when the project stops being small: the agent can keep adding code while the real workflow state is already wrong.

Common symptoms look familiar:

a retry path runs twice and creates duplicate side effects;
branches multiply until nobody knows which path still owns the state;
tests pass, but they no longer prove the claim being made;
a child fix is fresh, while the parent plan still trusts stale evidence;
a UI has visible buttons, but the launch-to-terminal journey has no valid recovery path;
a release, README, or "done" claim survives after later code, docs, tests, or peer writes invalidated it.

FlowGuard is built for that problem. It makes the risky transition explicit before the agent acts, runs a small executable state model, and turns hidden failure paths into counterexamples that can change the next engineering step.

It does not promise that software has no bugs. It helps prevent a specific maintenance disaster: AI agents continuing from an invalid state and piling more code, tests, or public claims on top of it.

What FlowGuard Does

FlowGuard is a lightweight Python toolkit and AI-agent skill layer for model-first workflow checks. Any agent can use the method; Codex-compatible skills are provided as ready-to-install guidance, not as the boundary of the project.

The practical loop is:

risky AI action -> explicit state model -> executable checks
-> counterexample trace -> revise plan, code, tests, UI, or claim

Instead of asking an agent to "be careful", FlowGuard asks the agent to name the state, inputs, outputs, side effects, ownership boundaries, and evidence gates that decide whether the next action is safe.

The Core Model

Each function block is represented as:

Input x State -> Set(Output x State)

That small shape is enough to expose many large-project problems:

repeated inputs that should be idempotent;
dead branches that return no legal output;
multiple possible outputs that need explicit ownership;
stale evidence after later artifact changes;
remembered maintenance obligations that reopen when the same surface changes;
parent/child model drift;
invalid final claims after skipped or scoped validation.

FlowGuard explores finite traces inside the declared model and checks invariants, scenarios, progress properties, conformance expectations, evidence freshness, and closure boundaries. When a check fails, the important output is the counterexample path: the concrete state sequence that shows why the current plan should not continue unchanged.

Start Small

Most useful FlowGuard work can start thin:

choose one risky boundary
-> name Input, State, Output, side effects, and owners
-> write one invariant or scenario
-> run the check
-> inspect the counterexample
-> fix the model, plan, code, tests, UI, or claim

Only escalate when the risk boundary demands it: UI topology, code structure, test hierarchy, model/test alignment, parent/child model mesh, staged release evidence, architecture reduction, existing-model ownership, or model-miss repair.

What You Can Design And Verify

Work type	What FlowGuard helps design	What it checks
Development process	staged route, legal next actions, validation gates, stale-evidence reset, peer-write invalidation, done and release readiness	skipped gates, stale validation, progress-only evidence, invalid completion claims
UI flow	persistent regions, contextual panels, visible controls, visible surface, overlays, recovery paths, display ownership, duplicate-control rules	launch-to-terminal journeys, unavailable controls, missing disabled reasons, missing recovery, duplicate actions, evidence-kind gaps, warning/error escalation
Code structure	module split, facade boundary, state owner, side-effect owner, config owner, validation owner	ownership leaks, dependency cycles, facade drift, config drift, missing parity evidence
Architecture reduction	observable contract, duplicate implementation candidates, safe target action	whether handlers, adapters, branches, modules, or validation layers can shrink without changing observable behavior
Test strategy	routine and release test layers, parent/child suites, timeout boundaries, stale and hidden evidence rules	skipped tests, release-only evidence, broad/slow direct checks, stale passes, wrong-provenance evidence
Model mesh and bug repair	parent/child boundaries, reattachment gates, sibling-impact review, same-class bad cases	child evidence freshness, parent consumption, analogous defect risks, same-class test evidence
Release and public claims	evidence ledger, claim-chain boundary, publish readiness	whether the public claim is supported by current model, code, test, mesh, freshness, and risk evidence

This is the core product: FlowGuard turns a vague workflow, UI journey, refactor, test strategy, or release process into a small state machine with explicit failure traces. The counterexample is not just a bug report; it is design feedback that says which state, gate, owner, or evidence rule must change before work continues.

What FlowGuard Is Not

FlowGuard is not an LLM wrapper, a prompt trick, a probability engine, a Monte Carlo simulator, or a replacement for ordinary tests.

It is a structural preflight layer. Tests still verify production code. Code review still matters. UI polish and browser/device behavior still need real UI review. FlowGuard sits earlier: it asks whether the agent's intended transition is coherent before the transition becomes code, tests, UI, or a public confidence claim.

When To Use It

Use FlowGuard when order, state, ownership, side effects, UI availability, role handoff, or evidence freshness can change whether the plan is actually safe.

Good fits:

AI-agent coding work with multiple stages or handoffs;
retries, deduplication, cache refresh, ingestion, or repeated job processing;
refactors where public entrypoints must remain compatible;
UI flows where visible controls do not guarantee legal recovery paths;
tests or release checks where old evidence can be mistaken for current proof;
parent/child models where one green result should not automatically make the parent green.

Bad fits:

one-line typo fixes;
formatting-only edits;
tasks where no state, side effect, ordering, or evidence boundary matters;
claims that need statistical truth rather than structural workflow checks.

Quick Start

git clone https://github.com/liuyingxuvka/FlowGuard.git
cd FlowGuard
python -m pip install -e .
python -m flowguard schema-version
python -m flowguard self-review

Adopt FlowGuard into another project so future agents can find the repository, version record, and project rules:

python -m flowguard project-adopt --root <target-project>
python -m flowguard project-audit --root <target-project>
python -m flowguard project-upgrade --root <target-project>

Useful template entry points:

python -m flowguard project-template
python -m flowguard project-adoption-template
python -m flowguard plan-detailing-template
python -m flowguard model-test-alignment-template
python -m flowguard existing-model-preflight-template
python -m flowguard model-similarity-template
python -m flowguard risk-evidence-ledger-template
python -m flowguard maintenance-scan-template
python -m flowguard closure-contract-template
python -m flowguard code-structure-recommendation-template
python -m flowguard ui-flow-structure-template
python -m flowguard development-process-flow-template
python -m flowguard workflow-step-contracts-template
python -m flowguard test-mesh-template
python -m flowguard structure-mesh-template
python -m flowguard topology-hazard-template

Run focused examples:

python examples/flowguard_product_boundary/run_review.py
python examples/hierarchical_model_mesh/run_review.py
python examples/risk_evidence_ledger/run_checks.py
python examples/job_matching/run_checks.py

Minimal Python Sketch

from dataclasses import dataclass, replace

from flowguard import Explorer, FunctionResult, Invariant, Workflow


@dataclass(frozen=True)
class State:
    processed: tuple[str, ...] = ()
    side_effects: int = 0


@dataclass(frozen=True)
class Input:
    job_id: str
    retry: bool = False


class ProcessJob:
    name = "process_job"
    accepted_input_type = Input
    reads = ("processed", "side_effects")
    writes = ("processed", "side_effects")
    input_description = "job request"
    output_description = "job status"
    idempotency = "same job id creates at most one side effect"

    def apply(self, input_obj: Input, state: State):
        if input_obj.job_id in state.processed:
            return [
                FunctionResult(
                    output="already_processed",
                    new_state=state,
                    label="deduplicated_retry",
                )
            ]
        return [
            FunctionResult(
                output="processed",
                new_state=replace(
                    state,
                    processed=state.processed + (input_obj.job_id,),
                    side_effects=state.side_effects + 1,
                ),
                label="first_processing",
            )
        ]


workflow = Workflow((ProcessJob(),), name="retry_deduplication")

report = Explorer(
    workflow=workflow,
    initial_states=(State(),),
    external_inputs=(Input("A"),),
    invariants=(
        Invariant(
            name="side_effect_once",
            description="The same job may not create duplicate side effects.",
            predicate=lambda state, trace: state.side_effects <= len(set(state.processed)),
        ),
    ),
    max_sequence_length=2,
    progress_steps=0,
).explore()
print(report.format_text())

Skill Routes

FlowGuard has one kernel and several peer satellite skills. Use the smallest route that owns the actual risk:

Skill route	Use it when
`model-first-function-flow`	ordinary behavior/state modeling, unclear route selection, or cross-route coordination
`flowguard-plan-detailing-compiler`	a rough idea or short plan needs explicit scope, state, side effects, evidence, receipts, rework, and claim boundaries
`flowguard-existing-model-preflight`	existing modeled-system work should first identify model ownership, reuse/extend decisions, and duplicate-boundary risk
`flowguard-development-process-flow`	staged development, archive, publish, release, or done confidence depends on current validation evidence
`flowguard-ui-flow-structure`	UI controls, visible surface, launch-to-terminal journeys, overlays, recovery paths, information ownership, and runnable evidence kinds need modeling
`flowguard-code-structure-recommendation`	a function-flow model should derive module, facade, state-owner, side-effect, config, and validation boundaries
`flowguard-structure-mesh`	a script, package, command, public API, or refactor split needs facade compatibility and parity evidence
`flowguard-test-mesh`	validation is layered, slow, stale, skipped, release-only, or split across child suites
`flowguard-model-test-alignment`	model obligations, code contracts, code-boundary observations, and test evidence need direct comparison
`flowguard-model-mesh`	parent/child model evidence, sibling impact, or oversized model surfaces need mesh governance
`flowguard-model-topology-hazard-review`	a locally green model may still imply repeatable future-use hazards before broad confidence
`flowguard-architecture-reduction`	model-equivalent handlers, adapters, modules, branches, or validation layers may be safely contracted
`flowguard-model-miss-review`	runtime, tests, replay, logs, or manual validation failed after a FlowGuard model passed

Relationship To The Guard Family

Project	Focus
FlowGuard	stateful behavior, process flow, evidence freshness, parent/child model confidence
LogicGuard	claims, evidence, warrants, assumptions, rebuttals, scope, and overclaiming in written reasoning
PhysicsGuard	low-fidelity residual checks and model-building blueprints for physical simulation debugging
FlowPilot	long-running project orchestration and route control for AI-agent software work

Documentation Map

File	Purpose
`docs/concept.md`	short conceptual introduction
`docs/modeling_protocol.md`	core model-first protocol
`docs/api_surface.md`	public Python API overview
`docs/invariant_examples.md`	examples of useful invariants
`docs/development_process_flow.md`	staged development, validation freshness, archive, publish, and release gates
`docs/ui_flow_structure.md`	UI interaction and structure modeling
`docs/code_structure_recommendation.md`	model-derived code structure recommendations
`docs/structure_mesh.md`	refactor and module split governance
`docs/test_evidence_mesh.md`	layered validation and evidence freshness
`docs/model_test_alignment.md`	model obligation and test evidence alignment
`docs/model_mesh_protocol.md`	parent/child model mesh governance
`docs/model_topology_hazard_review.md`	topology-grounded future-use hazard review
`docs/model_similarity_consolidation.md`	model-to-model relation review and consolidation handoffs
`docs/flowguard_closure_contract.md`	closure contract for complete FlowGuard use
`docs/risk_evidence_ledger.md`	risk-to-model-to-code-to-evidence confidence boundary
`docs/runtime_gateway_adoption.md`	runtime gateway adoption levels and critical-state writer inventory

Repository Layout

flowguard/     Core library, review helpers, templates, mesh routes, CLI
examples/      Small executable models and public self-reviews
docs/          Protocols, API notes, examples, and adoption guidance
tests/         Focused regression tests for the public helpers
assets/        README hero image and generation notes

Public Boundary

This repository is designed to be useful as a public starter and reference implementation. It includes the library, examples, protocol docs, public templates, and AI-agent skill material, including Codex-compatible skills.

It does not include private project logs, personal predictive knowledge, credentials, customer data, or claims that every real system is covered by the model. A FlowGuard pass means the declared model obligations passed. It does not mean the whole project is correct.

License

MIT. See LICENSE.

中文说明

FlowGuard 是给 AI 编程 agent 用的 model-first guardrail：在小改动逐渐变成大项目维护债之前，先把危险的状态转移建模并检查。

为什么需要 FlowGuard

AI 编程 agent 在小代码、小软件、小改动里很好用。真正危险的地方通常出现在项目变大以后：agent 还能继续写代码，但实际 workflow 的状态、证据、ownership 或发布结论已经错了。

常见症状包括：

retry 路径跑了两次，产生重复 side effect；
分支越来越多，没人知道哪条路径还拥有真实 state；
测试通过了，但已经不再证明当前声明；
子模型刚修好，父级计划却还在信任旧证据；
UI 有可见按钮，但从启动到终态的 journey 没有合法恢复路径；
release、README 或 done 声明在后续代码、文档、测试或 peer 写入后仍然留着。

FlowGuard 就是为这个问题设计的。它在 agent 动手之前，把危险转移写成明确的小型状态模型，运行可执行检查，并把隐藏失败路径变成能改变下一步工程动作的 counterexample。

它不承诺软件没有 bug。它防的是一种更具体的维护灾难：AI agent 带着无效状态继续推进，把更多代码、测试或公开声明堆在错误计划上。

FlowGuard 做什么

FlowGuard 是轻量 Python 工具包，也是通用 AI-agent skill layer。任何 agent 都可以用这套方法；Codex-compatible skills 只是现成接入方式，不是项目边界。

实际循环是：

危险 AI 行动 -> 显式状态模型 -> 可执行检查
-> counterexample trace -> 修改计划、代码、测试、UI 或声明

它不是让 agent “小心一点”，而是要求 agent 说清楚 state、input、output、side effect、ownership boundary 和 evidence gate，因为这些才决定下一步是否安全。

核心模型

每个 function block 表示为：

Input x State -> Set(Output x State)

这个小结构足以暴露很多大项目问题：

重复 input 是否应该幂等；
死分支是否没有合法 output；
多个可能 output 是否有明确 ownership；
后续 artifact 变化后证据是否过期；
记住的维护义务是否会在同一风险面被再次改动时自动重开；
父子模型证据是否漂移；
final claim 是否在跳过验证或 scoped 验证后仍被接受。

FlowGuard 在声明的有限模型里探索 trace，并检查 invariant、scenario、progress、conformance、evidence freshness 和 closure boundary。失败时，最重要的输出是 counterexample path：一条具体状态序列，说明当前计划为什么不能原样继续。

从小开始

大多数 FlowGuard 使用可以很薄：

选择一个风险边界
-> 命名 Input、State、Output、side effect 和 owner
-> 写一个 invariant 或 scenario
-> 运行检查
-> 看 counterexample
-> 修模型、计划、代码、测试、UI 或声明

只有风险边界真的需要时，才升级到高级路线：UI 拓扑、代码结构、测试层级、model/test alignment、parent/child model mesh、分阶段发布证据、architecture reduction、existing-model ownership 或 model-miss repair。

它能设计并验证什么

工作类型	FlowGuard 帮你设计什么	它检查什么
开发流程	staged route、合法 next action、validation gate、stale-evidence reset、peer-write invalidation、done/release readiness	skipped gate、旧验证、progress-only evidence、无效完成声明
UI flow	持久区域、上下文 panel、可见控件、可见表面、overlay、恢复路径、display ownership、重复控件规则	launch-to-terminal journey、不可用控件、缺失 disabled reason、缺失恢复、重复动作、evidence kind 缺口、warning/error escalation
代码结构	module split、facade boundary、state owner、side-effect owner、config owner、validation owner	ownership leak、dependency cycle、facade drift、config drift、缺失 parity evidence
架构缩减	observable contract、重复实现 candidate、安全 target action	handler、adapter、branch、module 或 validation layer 能否在不改变可观察行为的前提下收缩
测试策略	routine/release test layer、父子 suite、timeout 边界、旧证据和隐藏证据规则	skipped test、release-only evidence、过宽/过慢直接检查、旧 pass、来源不对的 evidence
Model mesh 和 bug 修复	父子边界、reattachment gate、sibling-impact review、same-class bad case	child evidence freshness、parent consumption、analogous defect risk、same-class test evidence
发布和公开声明	evidence ledger、claim-chain boundary、publish readiness	公开声明是否被当前 model、code、test、mesh、freshness 和 risk evidence 支撑

这才是 FlowGuard 的核心产品：把模糊的 workflow、UI journey、refactor、test strategy 或 release process 变成小型状态机，并给出明确失败路径。counterexample 不是单纯 bug report，而是设计反馈：它告诉你哪个 state、gate、owner 或 evidence rule 必须先改，工作才能继续。

FlowGuard 不是什么

FlowGuard 不是 LLM wrapper，不是 prompt trick，不是概率引擎，不是 Monte Carlo simulator，也不是普通测试的替代品。

它是结构化预检层。测试仍然要验证真实代码，code review 仍然重要，UI polish 和浏览器/设备行为仍然需要真实 UI review。FlowGuard 更早一步：在转移变成代码、测试、UI 或公开信心声明之前，先检查这个转移是否自洽。

什么时候用

当顺序、状态、ownership、side effect、UI availability、role handoff 或 evidence freshness 会改变计划是否安全时，用 FlowGuard。

适合：

有多个阶段或 handoff 的 AI-agent coding work；
retry、deduplication、cache refresh、ingestion 或重复 job processing；
公开入口必须兼容的 refactor；
可见控件不等于合法恢复路径的 UI flow；
旧 evidence 可能被误当作当前 proof 的测试或发布检查；
一个 child green 不应该自动让 parent green 的父子模型。

不适合：

一行 typo；
纯格式修改；
没有 state、side effect、顺序或 evidence boundary 的任务；
需要统计事实而不是结构化 workflow 检查的声明。

快速开始

git clone https://github.com/liuyingxuvka/FlowGuard.git
cd FlowGuard
python -m pip install -e .
python -m flowguard schema-version
python -m flowguard self-review

把 FlowGuard 接入另一个项目，让后续 agent 能看到仓库地址、版本记录和项目规则：

python -m flowguard project-adopt --root <target-project>
python -m flowguard project-audit --root <target-project>
python -m flowguard project-upgrade --root <target-project>

常用模板入口：

python -m flowguard project-template
python -m flowguard project-adoption-template
python -m flowguard plan-detailing-template
python -m flowguard model-test-alignment-template
python -m flowguard existing-model-preflight-template
python -m flowguard model-similarity-template
python -m flowguard risk-evidence-ledger-template
python -m flowguard maintenance-scan-template
python -m flowguard closure-contract-template
python -m flowguard code-structure-recommendation-template
python -m flowguard ui-flow-structure-template
python -m flowguard development-process-flow-template
python -m flowguard workflow-step-contracts-template
python -m flowguard test-mesh-template
python -m flowguard structure-mesh-template
python -m flowguard topology-hazard-template

运行示例：

python examples/flowguard_product_boundary/run_review.py
python examples/hierarchical_model_mesh/run_review.py
python examples/risk_evidence_ledger/run_checks.py
python examples/job_matching/run_checks.py

Skill 架构

FlowGuard 有一个 kernel 和多个同级 satellite skill。选择真正拥有当前风险的最小路线：

Skill route	适用场景
`model-first-function-flow`	普通行为/状态建模、路线不清楚或跨路线协调
`flowguard-plan-detailing-compiler`	粗想法或短计划需要明确 scope、state、side effect、evidence、receipt、rework 和 claim boundary
`flowguard-existing-model-preflight`	已有模型系统里的工作要先找到 model owner、复用/扩展判断和重复 boundary 风险
`flowguard-development-process-flow`	staged development、archive、publish、release 或 done confidence 取决于当前验证证据
`flowguard-ui-flow-structure`	UI control、可见表面、launch-to-terminal journey、overlay、恢复路径、信息 ownership 和 runnable evidence kind 需要建模
`flowguard-code-structure-recommendation`	function-flow 模型要推导 module、facade、state-owner、side-effect、config 和 validation boundary
`flowguard-structure-mesh`	脚本、包、命令、公开 API 或 refactor 拆分需要 facade compatibility 和 parity evidence
`flowguard-test-mesh`	验证分层、很慢、过期、被 skip、只在 release 跑，或分布在 child suite
`flowguard-model-test-alignment`	需要直接对照 model obligation、code contract、code-boundary observation 和 test evidence
`flowguard-model-mesh`	parent/child model evidence、sibling impact 或过大 model surface 需要 mesh governance
`flowguard-model-topology-hazard-review`	本地 green 模型在广泛信心前仍可能暗示可复发的未来使用风险
`flowguard-architecture-reduction`	模型等价的 handler、adapter、module、branch 或 validation layer 可能可以安全收缩
`flowguard-model-miss-review`	runtime、test、replay、log 或人工验证在 FlowGuard 模型通过后仍然失败

Guard Family 关系

项目	关注点
FlowGuard	stateful behavior、process flow、evidence freshness、parent/child model confidence
LogicGuard	写作推理里的 claim、evidence、warrant、assumption、rebuttal、scope 和 overclaiming
PhysicsGuard	物理仿真调试中的低保真 residual check 和模型构建蓝图
FlowPilot	长周期 AI-agent 软件工作的项目编排和路线控制

文档入口

文件	作用
`docs/concept.md`	简短概念介绍
`docs/modeling_protocol.md`	核心 model-first 协议
`docs/api_surface.md`	公开 Python API 概览
`docs/invariant_examples.md`	常用 invariant 示例
`docs/development_process_flow.md`	staged development、validation freshness、archive、publish 和 release gate
`docs/ui_flow_structure.md`	UI interaction 和结构建模
`docs/code_structure_recommendation.md`	模型推导代码结构建议
`docs/structure_mesh.md`	refactor 和 module split 治理
`docs/test_evidence_mesh.md`	分层验证和证据新鲜度
`docs/model_test_alignment.md`	模型义务和测试证据对齐
`docs/model_mesh_protocol.md`	parent/child model mesh 治理
`docs/model_topology_hazard_review.md`	从模型拓扑推断未来使用风险的审查
`docs/model_similarity_consolidation.md`	model-to-model 关系审查和 consolidation handoff
`docs/flowguard_closure_contract.md`	完整 FlowGuard 使用的 closure contract
`docs/risk_evidence_ledger.md`	risk-to-model-to-code-to-evidence 信心边界
`docs/runtime_gateway_adoption.md`	runtime gateway adoption level 和 critical-state writer inventory

仓库结构

flowguard/     核心库、review helpers、templates、mesh routes、CLI
examples/      小型可执行模型和公开 self-review
docs/          协议、API 说明、示例和 adoption guidance
tests/         针对公开 helper 的回归测试
assets/        README hero image 和生成说明

公开边界

这个仓库适合作为公开 starter 和 reference implementation。它包含库代码、示例、协议文档、公开模板和通用 AI-agent skill material，其中也包括 Codex-compatible skills。

它不包含私有项目日志、个人 predictive knowledge、credential、客户数据，也不声称模型覆盖了所有真实系统。FlowGuard 通过只表示声明的模型义务通过，不表示整个项目已经正确。

许可证

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.agents/skills		.agents/skills
.flowguard		.flowguard
.github/workflows		.github/workflows
assets/readme-hero		assets/readme-hero
docs		docs
examples		examples
flowguard		flowguard
openspec		openspec
scripts		scripts
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

FlowGuard

Why FlowGuard Exists

What FlowGuard Does

The Core Model

Start Small

What You Can Design And Verify

What FlowGuard Is Not

When To Use It

Quick Start

Minimal Python Sketch

Skill Routes

Relationship To The Guard Family

Documentation Map

Repository Layout

Public Boundary

License

中文说明

为什么需要 FlowGuard

FlowGuard 做什么

核心模型

从小开始

它能设计并验证什么

FlowGuard 不是什么

什么时候用

快速开始

Skill 架构

Guard Family 关系

文档入口

仓库结构

公开边界

许可证

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 75

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages