Skip to content

[lwp][rv64] fix potential signal handler infinite loop #10500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eatvector
Copy link
Contributor

@eatvector eatvector commented Jul 13, 2025

拉取/合并请求描述:(PR description)

修复用户态信号处理函数循环调用导致的执行流中断问题,关联#10501
[

为什么提交这份PR (why to submit this PR)

当前risc-v64信号处理机制存在风险,可能导致线程陷入无法退出的信号处理循环。这一设计缺陷会直接影响线程的正常执行流程,可能造成线程无法正常终止。

问题的典型场景出现在线程取消操作中。当线程A调用pthread_cancel试图取消线程B时,会向线程B发送SIGCANCEL信号。若线程B在执行地址addr处的代码后被信号中断,内核会保存当前上下文,并将控制流重定向到cancel_handler信号处理函数。

危险的情况在于,cancel_handler内部可能会触发新的SIGCANCEL信号,系统就会陷入一个处理循环。具体表现为:线程B执行完cancel_handler后,通过lwp_sigreturn触发系统调用返回内核;内核检测到cancel_handler内部发送的等待处理的SIGCANCEL信号后,会再次将控制流指向cancel_handler,而不是恢复原先被中断的执行点addr,如此形成无法终止的循环。

这种机制缺陷会导致线程B永远只能执行cancel_handler和lwp_sigreturn的代码,无法继续执行addr之后的代码。由于线程无法执行到取消点,pthread_exit也就永远不会被触发,线程B将无法被正常终止。

你的解决方案是什么 (what is your solution)

避免在arch_signal_quit信号处理返回路径中立即处理新信号,而是延迟到下次陷入内核重新返回用户态时,使得线程有机会从第一次被信号中断的地方执行后续代码。

请提供验证的bsp和config (provide the config and bsp)

  • BSP:
  • .config:
  • action:

]

当前拉取/合并请求的状态 Intent for your PR

必须选择一项 Choose one (Mandatory):

  • 本拉取/合并请求是一个草稿版本 This PR is for a code-review and is intended to get feedback
  • 本拉取/合并请求是一个成熟版本 This PR is mature, and ready to be integrated into the repo

代码质量 Code Quality:

我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:

  • 已经仔细查看过代码改动的对比 Already check the difference between PR and old code
  • 代码风格正确,包括缩进空格,命名及其他风格 Style guide is adhered to, including spacing, naming and other styles
  • 没有垃圾代码,代码尽量精简,不包含#if 0代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up
  • 所有变更均有原因及合理的,并且不会影响到其他软件组件代码或BSP All modifications are justified and not affect other components or BSP
  • 对难懂代码均提供对应的注释 I've commented appropriately where code is tricky
  • 代码是高质量的 Code in this PR is of high quality
  • 已经使用formatting 等源码格式化工具确保格式符合RT-Thread代码规范 This PR complies with RT-Thread code specification
  • 如果是新增bsp, 已经添加ci检查到.github/workflows/bsp_buildings.yml 详细请参考链接BSP自查

@eatvector eatvector requested a review from BernardXiong as a code owner July 13, 2025 06:23
@github-actions github-actions bot added RT-Smart RT-Thread Smart related PR or issues component: lwp Component labels Jul 13, 2025
Copy link

github-actions bot commented Jul 13, 2025

📌 Code Review Assignment

🏷️ Tag: components

Reviewers: Maihuanyi

Changed Files (Click to expand)
  • components/lwp/arch/risc-v/rv64/lwp_gcc.S

🏷️ Tag: components_lwp

Reviewers: xu18838022837

Changed Files (Click to expand)
  • components/lwp/arch/risc-v/rv64/lwp_gcc.S

📊 Current Review Status (Last Updated: 2025-07-18 17:11 CST)

  • Maihuanyi Pending Review
  • xu18838022837 Pending Review

📝 Review Instructions

  1. 维护者可以通过单击此处来刷新审查状态: 🔄 刷新状态
    Maintainers can refresh the review status by clicking here: 🔄 Refresh Status

  2. 确认审核通过后评论 LGTM/lgtm
    Comment LGTM/lgtm after confirming approval

  3. PR合并前需至少一位维护者确认
    PR must be confirmed by at least one maintainer before merging

ℹ️ 刷新CI状态操作需要具备仓库写入权限。
ℹ️ Refresh CI status operation requires repository Write permission.

@eatvector eatvector changed the title [lwp][rv64] riscv: fix potential signal handler infinite loop [lwp][rv64] fix potential signal handler infinite loop Jul 13, 2025
@@ -119,7 +142,7 @@ arch_signal_quit:

RESTORE_ALL
SAVE_ALL
j arch_ret_to_user
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以结合下aarch64的情况,查看下这部分如何处理比较合适

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

您好,aarch64和riscv64的处理都是类似的,避免在arch_signal_quit->arch_ret_to_user->lwp_thread_signal_catch 这条调用链上使用lwp_thread_signal_catch 处理信号即可,其余部分保持不变。我尝试了三种可能的修改方案:

1 通过给arch_ret_to_user传入一个额外参数已指示是否需要调用lwp_thread_signal_catch,这种方法可以让线程通过统一的arch_ret_to_user返回用户态。但是所有调用arch_ret_to_user的地方都需要额外传入参数,实现示例链接:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo0?expand=1

2 单独针对arch_signal_quit实现一个独立的返回用户态调用接口,与之前提交的方案类似,不过复用了arch_ret_to_user的代码已减少重复代码,这种方案应该是在riscv64和aarch64上最清晰,修改也最少的实现,示例链接:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo1?expand=1

3 在线程结构体内部增加额外标记已指示线程是否通过arch_signal_quit返回用户态,如果该标志被设置,则在lwp_thread_signal_catch 不进行任何处理,这种方法所需修改的架构相关内容最少,但是得修改相关的c文件,以及相关结构体,实现示例:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo3?expand=1

不知道您认为哪种修改方案更好,或者有其他更好的建议吗?

@eatvector eatvector force-pushed the fix/signal-handler-loop branch 2 times, most recently from c1b8586 to ee02769 Compare July 17, 2025 04:06
@kurisaW
Copy link
Member

kurisaW commented Jul 17, 2025

ci问题需要等该PR合并后即可正常:

@eatvector
Copy link
Contributor Author

ci问题需要等该PR合并后即可正常:

好的

@kurisaW kurisaW closed this Jul 18, 2025
@kurisaW kurisaW reopened this Jul 18, 2025
@eatvector eatvector force-pushed the fix/signal-handler-loop branch from ee02769 to a725673 Compare July 18, 2025 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: lwp Component RT-Smart RT-Thread Smart related PR or issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants