-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[lwp][rv64] fix potential signal handler infinite loop #10500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📌 Code Review Assignment🏷️ Tag: componentsReviewers: Maihuanyi Changed Files (Click to expand)
🏷️ Tag: components_lwpReviewers: xu18838022837 Changed Files (Click to expand)
📊 Current Review Status (Last Updated: 2025-07-18 17:11 CST)
📝 Review Instructions
|
@@ -119,7 +142,7 @@ arch_signal_quit: | |||
|
|||
RESTORE_ALL | |||
SAVE_ALL | |||
j arch_ret_to_user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以结合下aarch64的情况,查看下这部分如何处理比较合适
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
您好,aarch64和riscv64的处理都是类似的,避免在arch_signal_quit->arch_ret_to_user->lwp_thread_signal_catch 这条调用链上使用lwp_thread_signal_catch 处理信号即可,其余部分保持不变。我尝试了三种可能的修改方案:
1 通过给arch_ret_to_user传入一个额外参数已指示是否需要调用lwp_thread_signal_catch,这种方法可以让线程通过统一的arch_ret_to_user返回用户态。但是所有调用arch_ret_to_user的地方都需要额外传入参数,实现示例链接:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo0?expand=1
2 单独针对arch_signal_quit实现一个独立的返回用户态调用接口,与之前提交的方案类似,不过复用了arch_ret_to_user的代码已减少重复代码,这种方案应该是在riscv64和aarch64上最清晰,修改也最少的实现,示例链接:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo1?expand=1
3 在线程结构体内部增加额外标记已指示线程是否通过arch_signal_quit返回用户态,如果该标志被设置,则在lwp_thread_signal_catch 不进行任何处理,这种方法所需修改的架构相关内容最少,但是得修改相关的c文件,以及相关结构体,实现示例:https://github.com/RT-Thread/rt-thread/compare/master...eatvector:rt-thread:demo3?expand=1
不知道您认为哪种修改方案更好,或者有其他更好的建议吗?
c1b8586
to
ee02769
Compare
ci问题需要等该PR合并后即可正常: |
好的 |
ee02769
to
a725673
Compare
拉取/合并请求描述:(PR description)
修复用户态信号处理函数循环调用导致的执行流中断问题,关联#10501
[
为什么提交这份PR (why to submit this PR)
当前risc-v64信号处理机制存在风险,可能导致线程陷入无法退出的信号处理循环。这一设计缺陷会直接影响线程的正常执行流程,可能造成线程无法正常终止。
问题的典型场景出现在线程取消操作中。当线程A调用pthread_cancel试图取消线程B时,会向线程B发送SIGCANCEL信号。若线程B在执行地址addr处的代码后被信号中断,内核会保存当前上下文,并将控制流重定向到cancel_handler信号处理函数。
危险的情况在于,cancel_handler内部可能会触发新的SIGCANCEL信号,系统就会陷入一个处理循环。具体表现为:线程B执行完cancel_handler后,通过lwp_sigreturn触发系统调用返回内核;内核检测到cancel_handler内部发送的等待处理的SIGCANCEL信号后,会再次将控制流指向cancel_handler,而不是恢复原先被中断的执行点addr,如此形成无法终止的循环。
这种机制缺陷会导致线程B永远只能执行cancel_handler和lwp_sigreturn的代码,无法继续执行addr之后的代码。由于线程无法执行到取消点,pthread_exit也就永远不会被触发,线程B将无法被正常终止。
你的解决方案是什么 (what is your solution)
避免在arch_signal_quit信号处理返回路径中立即处理新信号,而是延迟到下次陷入内核重新返回用户态时,使得线程有机会从第一次被信号中断的地方执行后续代码。
请提供验证的bsp和config (provide the config and bsp)
]
当前拉取/合并请求的状态 Intent for your PR
必须选择一项 Choose one (Mandatory):
代码质量 Code Quality:
我在这个拉取/合并请求中已经考虑了 As part of this pull request, I've considered the following:
#if 0
代码,不包含已经被注释了的代码 All redundant code is removed and cleaned up