fix(weixin): flock singleton — 防多实例 race#2
Merged
Conversation
Symptom: 用户在 WeChat 发一条消息, bot 回 2-3 次. 根因两层: 1. weixin_bridge.start() 暴力 os.unlink(SOCKET_PATH) 后 bind, 第 2 个 weixin_bot.py 不会因 socket 占用 fail, 反而夺权 — 但前一个 instance 不会死, 仍在 iLink 长轮询. 两个 process race-handling 每条入站消息. 2. bot.py 的 _spawn_weixin_if_configured 在多 launchd plist (TG 多 bot 实例) 部署下会被每个 bot.py 启动各 spawn 一份, 配置层无 mutex. Fix: weixin_bot.py 启动时先拿 fcntl.flock(/tmp/babata-weixin.lock). - LOCK_EX | LOCK_NB: 原子拿锁, 拿不到立刻返回. - Lock holder 写自己 PID (a+ 模式不 truncate, 保留前 holder breadcrumb). - Loser normal 模式: log + sys.exit(0). 配套 launchd plist 用 KeepAlive dict + SuccessfulExit=false 防 spin loop relaunch (本 PR 不动 plist 因为 install.sh 不生成 weixin 独立 plist; 仅文档级建议给手 build launchd 用户). - Loser --login 模式: log + sys.exit(1) 明确错误 (V 加新账号时 silent exit 会困惑). - BSD flock: 进程死亡 OS 自动释放, 无需 atexit 清理. 实测: 4 个 launchd plist 并发 spawn 场景, 仅 1 个 weixin_bot.py 跑 (/tmp/babata-weixin.lock 内含 holder PID), 其余 lock-loser 立即 exit. Codex round 1 review 抓 4 个 finding (deploy 顺序 / KeepAlive spin / PID truncate / --login silent) 全已纳入此实现. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
weixin_bridge.start()暴力os.unlink(SOCKET_PATH)后 bind, 第 2 个 weixin_bot.py 不会因 socket 占用 fail, 反而夺权 — 但前一个 instance 不会死, 仍在 iLink 长轮询. 两个 process race-handling 每条入站消息.fcntl.flock(/tmp/babata-weixin.lock), 拿不到锁立即 exit. 任何配置错 / 多 launchd 部署 / 手动多次启动场景下, OS 层强制只跑一份._spawn_weixin_if_configured各 spawn 一份) 都会撞. OSS 单进程模式不直接受影响, 但仍受益于这层防御.Test plan
/tmp/babata-weixin.lock内容 = holder PIDopen("a+")不 truncate)--loginsilent exit (loser 区分 normal vs --login 模式)KeepAlive用 dict +SuccessfulExit=false, 让 lock-loser exit 0 不被 launchd 重启循环. install.sh 当前不生成 weixin 独立 plist, 此 PR 仅修代码不动模板; 手 build launchd 用户参考_acquire_singleton_lockdocstringNotes
/tmp路径选择: 跟weixin_bridge.SOCKET_PATH一致weixin_bridge.start()的os.unlink保留 — 现在 singleton 锁兜底, 跑到这里说明是唯一持锁者, socket 即使存在也是 stale (上轮 SIGKILL 残留)🤖 Generated with Claude Code