Skip to content

optimize(rpctimeout): defense user provided non-compliant ctx with ctx.Done() triggered but ctx.Err() returning nil#1949

Closed
DMwangnima wants to merge 1 commit intocloudwego:mainfrom
DMwangnima:optimize/kitex-rpctimeout-defense
Closed

optimize(rpctimeout): defense user provided non-compliant ctx with ctx.Done() triggered but ctx.Err() returning nil#1949
DMwangnima wants to merge 1 commit intocloudwego:mainfrom
DMwangnima:optimize/kitex-rpctimeout-defense

Conversation

@DMwangnima
Copy link
Copy Markdown
Contributor

@DMwangnima DMwangnima commented Apr 15, 2026

What type of PR is this?

optimize

Check the PR title.

  • This PR title match the format: <type>(optional scope): <description>
  • The description of this PR title is user-oriented and clear enough for others to understand.
  • Attach the PR updating the user documentation if the current PR requires user awareness at the usage level. User docs repo

(Optional) Translate the PR title into Chinese.

(Optional) More detailed description for this PR(en: English/zh: Chinese).

en:
zh(optional):

  • 背景 / 问题
    当用户传入的 ctx 违反 context.Context 约定——Done() 已触发但 Err() 仍返回 nil(常见于自定义 "WithoutCancel" wrapper:embed 了父 context 导致继承其 Done(),却重写 Err() 为 nil)。会在 RPC 超时路径上引发 nil pointer panic:
  1. timeoutTask.Wait() 中 Cancel(parentErr=nil) 成为空操作,timeoutContext.ch 未关闭、Err() 仍为 nil
  2. Wait() 返回 (ctx, nil) → Call() 视为成功 → 设置 recycleRI=true → PutRPCInfo 回收 RPCInfo(ri.to 被清零)
  3. worker goroutine 仍在运行 → 访问 ri.To().ServiceName() → nil panic
  4. Run() 的 recover 中 ClientPanicToErr 再次调用 ri.To() → double panic
  • 方案
  1. handleParentDone 在 Cancel 前检查 parentErr,为 nil 时用哨兵错误 errUserProvidedContextNonCompliant 替代,保证 Cancel 始终收到非 nil 错误、ch 必然关闭
  2. makeTimeoutErr 识别该哨兵错误,直接返回 ErrCanceledByBusiness 并附带清晰提示,避免误报为 "rpc timeout"
  3. Wait() / waitNoTimeout() 统一走 handleParentDone,消除分支差异

(Optional) Which issue(s) this PR fixes:

(optional) The PR that updates user documentation:

@DMwangnima DMwangnima requested review from a team as code owners April 15, 2026 06:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 62.71%. Comparing base (bfa27f3) to head (4c2b01d).
⚠️ Report is 46 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1949      +/-   ##
==========================================
+ Coverage   61.37%   62.71%   +1.33%     
==========================================
  Files         388      393       +5     
  Lines       35063    30113    -4950     
==========================================
- Hits        21521    18885    -2636     
+ Misses      12247     9933    -2314     
  Partials     1295     1295              
Flag Coverage Δ
integration 51.66% <0.00%> (+1.15%) ⬆️
unit 53.09% <100.00%> (+1.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@DMwangnima DMwangnima force-pushed the optimize/kitex-rpctimeout-defense branch 3 times, most recently from de74e0e to a1fb208 Compare April 15, 2026 08:12
…x.Done() triggered but ctx.Err() returning nil
@DMwangnima DMwangnima force-pushed the optimize/kitex-rpctimeout-defense branch from a1fb208 to 4c2b01d Compare April 20, 2026 09:43
@DMwangnima DMwangnima closed this Apr 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant