-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Open
Description
背景
飞桨在3.1 版本推出了 类 CUDA 硬件接入方案。该方案在 Custom Device硬件接入方案 的基础上进行了升级,最大的特点是可以 复用飞桨 PHI 算子库中的大量 CUDA Kernel。 当前此方案已经成功接入沐曦(metax_gpu)和天数智芯(iluvatar_gpu)。
然而,目前PHI 算子库中的部分 CUDA Kernel 并未考虑被其他模块复用的情况,导致出现以下问题: 部分 Kernel 缺少函数声明,类 CUDA 硬件在复用时不得不直接 #include
.cu
源文件,这不符合代码规范。
因此,本次活动旨在对 PHI算子库的 CUDA Kernel 进行规范化修复:
- 在Paddle仓库中为缺少头文件的 Kernel 新增对应声明文件(
.h
); - 修复 PaddleCustomDevice 仓库中错误的
#include cu
用法,改为#include
正确的头文件。
涉及范围
-
涉及仓库
-
影响文件
在 PaddleCustomDevice 仓 中,所有被#include
到注册文件中的算子 Kernel.cu
源文件,共 136 个。
具体文件列表见下方表格:
任务
修复目标
- 在 PaddlePaddle 仓库 中为缺少声明的 Kernel 新增头文件;
- 在 PaddleCustomDevice 仓库 中修改错误的
#include *.cu
,改为#include
新增的头文件,同时把Kernel的实现代码正确的添加到CMakelists编译列表中。需要修改的代码只出现在backends/metax_gpu
和backends/iluvatar_gpu
这两个目录下。
序号 | 文件名称 | 认领人 / 状态 / PR号 |
---|---|---|
1 | paddle/phi/kernels/fusion/gpu/distributed_fused_lamb_init_kernel.cu | @Le-soleile @YqGe585 |
2 | paddle/phi/kernels/fusion/gpu/fused_bias_act_kernel.cu | @Le-soleile |
3 | paddle/phi/kernels/fusion/gpu/fused_bias_dropout_residual_layer_norm_grad_kernel.cu | @wanglezz |
4 | paddle/phi/kernels/fusion/gpu/fused_bias_dropout_residual_layer_norm_kernel.cu | @wanglezz |
5 | paddle/phi/kernels/fusion/gpu/fused_embedding_eltwise_layernorm_kernel.cu | @wanglezz |
6 | paddle/phi/kernels/fusion/gpu/fused_layernorm_kernel.cu | @WanRui37 |
7 | paddle/phi/kernels/fusion/gpu/fused_seqpool_cvm_grad_kernel.cu | @SpongeBob0318 |
8 | paddle/phi/kernels/fusion/gpu/fused_seqpool_cvm_kernel.cu | @SpongeBob0318 |
9 | paddle/phi/kernels/fusion/gpu/fused_softmax_mask_grad_kernel.cu | @SpongeBob0318 |
10 | paddle/phi/kernels/fusion/gpu/fused_softmax_mask_kernel.cu | @youge325 |
11 | paddle/phi/kernels/fusion/gpu/fused_softmax_mask_upper_triangle_kernel.cu | |
12 | paddle/phi/kernels/fusion/gpu/fused_stack_transpose_quant_kernel.cu | @youge325 |
13 | paddle/phi/kernels/fusion/gpu/fused_transpose_split_quant_kernel.cu | @SpongeBob0318 |
14 | paddle/phi/kernels/fusion/gpu/fused_transpose_wlch_split_quant_kernel.cu | @SpongeBob0318 |
15 | paddle/phi/kernels/fusion/gpu/fusion_group_kernel.cu | @SpongeBob0318 |
16 | paddle/phi/kernels/fusion/gpu/masked_multihead_attention_kernel.cu | @Le-soleile |
17 | paddle/phi/kernels/fusion/gpu/qkv_unpack_mha_kernel.cu | @Le-soleile |
18 | paddle/phi/kernels/fusion/gpu/skip_layernorm_kernel.cu | @SpongeBob0318 |
19 | paddle/phi/kernels/gpu/affine_channel_grad_kernel.cu | @SpongeBob0318 |
20 | paddle/phi/kernels/gpu/affine_channel_kernel.cu | @SpongeBob0318 |
21 | paddle/phi/kernels/gpu/ap_facade_kernel.cu | @youge325 @Echo-Nie |
22 | paddle/phi/kernels/gpu/ap_trivial_fusion_begin_kernel.cu | @youge325 |
23 | paddle/phi/kernels/gpu/ap_trivial_fusion_end_kernel.cu | @youge325 |
24 | paddle/phi/kernels/gpu/ap_variadic_kernel.cu | @youge325 |
25 | paddle/phi/kernels/gpu/argsort_grad_kernel.cu | |
26 | paddle/phi/kernels/gpu/barrier_kernel.cu | @youge325 |
27 | paddle/phi/kernels/gpu/bce_loss_grad_kernel.cu | @Luxorion-12 |
28 | paddle/phi/kernels/gpu/bce_loss_kernel.cu | @tjujingzong |
29 | paddle/phi/kernels/gpu/binomial_kernel.cu | @tjujingzong |
30 | paddle/phi/kernels/gpu/bmm_grad_kernel.cu | @tjujingzong |
31 | paddle/phi/kernels/gpu/bmm_kernel.cu | @tjujingzong |
32 | paddle/phi/kernels/gpu/box_clip_kernel.cu | @algorithm1832 |
33 | paddle/phi/kernels/gpu/c_concat_kernel.cu | @algorithm1832 |
34 | paddle/phi/kernels/gpu/c_embedding_grad_kernel.cu | @algorithm1832 |
35 | paddle/phi/kernels/gpu/c_scatter_kernel.cu | @algorithm1832 |
36 | paddle/phi/kernels/gpu/c_softmax_with_cross_entropy_grad_kernel.cu | @youge325 |
37 | paddle/phi/kernels/gpu/cast_kernel.cu | |
38 | paddle/phi/kernels/gpu/class_center_sample_kernel.cu | |
39 | paddle/phi/kernels/gpu/collect_fpn_proposals_kernel.cu | @youge325 |
40 | paddle/phi/kernels/gpu/comm_init_all_kernel.cu | @youge325 |
41 | paddle/phi/kernels/gpu/complex_kernel.cu | |
42 | paddle/phi/kernels/gpu/correlation_grad_kernel.cu | @tjujingzong |
43 | paddle/phi/kernels/gpu/correlation_kernel.cu | @youge325 |
44 | paddle/phi/kernels/gpu/ctc_align_kernel.cu | |
45 | paddle/phi/kernels/gpu/cvm_grad_kernel.cu | @Le-soleile |
46 | paddle/phi/kernels/gpu/cvm_kernel.cu | @Le-soleile |
47 | paddle/phi/kernels/gpu/deformable_conv_grad_kernel.cu | @SqZhang666 |
48 | paddle/phi/kernels/gpu/deformable_conv_kernel.cu | @SqZhang666 |
49 | paddle/phi/kernels/gpu/elementwise_grad_kernel.cu | |
50 | paddle/phi/kernels/gpu/embedding_with_scaled_gradient_grad_kernel.cu | |
51 | paddle/phi/kernels/gpu/exponential_kernel.cu | |
52 | paddle/phi/kernels/gpu/flip_kernel.cu | |
53 | paddle/phi/kernels/gpu/fused_token_prune_kernel.cu | @Le-soleile |
54 | paddle/phi/kernels/gpu/gather_grad_kernel.cu | |
55 | paddle/phi/kernels/gpu/gelu_grad_kernel.cu | |
56 | paddle/phi/kernels/gpu/global_gather_kernel.cu | @Le-soleile |
57 | paddle/phi/kernels/gpu/global_scatter_kernel.cu | @Le-soleile |
58 | paddle/phi/kernels/gpu/group_norm_grad_kernel.cu | @chenjin060204 |
59 | paddle/phi/kernels/gpu/group_norm_kernel.cu | @chenjin060204 |
60 | paddle/phi/kernels/gpu/gru_kernel.cu | @algorithm1832 |
61 | paddle/phi/kernels/gpu/index_add_grad_kernel.cu | @algorithm1832 |
62 | paddle/phi/kernels/gpu/interpolate_grad_kernel.cu | @algorithm1832 |
63 | paddle/phi/kernels/gpu/interpolate_kernel.cu | @algorithm1832 |
64 | paddle/phi/kernels/gpu/kldiv_loss_grad_kernel.cu | @algorithm1832 |
65 | paddle/phi/kernels/gpu/kldiv_loss_kernel.cu | |
66 | paddle/phi/kernels/gpu/l1_norm_grad_kernel.cu | @Le-soleile |
67 | paddle/phi/kernels/gpu/l1_norm_kernel.cu | |
68 | paddle/phi/kernels/gpu/label_smooth_grad_kernel.cu | |
69 | paddle/phi/kernels/gpu/label_smooth_kernel.cu | |
70 | paddle/phi/kernels/gpu/lamb_kernel.cu | @dh-Unicorn |
71 | paddle/phi/kernels/gpu/lgamma_kernel.cu | @dh-Unicorn |
72 | paddle/phi/kernels/gpu/log_softmax_grad_kernel.cu | @dh-Unicorn |
73 | paddle/phi/kernels/gpu/logsumexp_kernel.cu | |
74 | paddle/phi/kernels/gpu/lookup_table_grad_kernel.cu | @Le-soleile |
75 | paddle/phi/kernels/gpu/lookup_table_kernel.cu | @Le-soleile |
76 | paddle/phi/kernels/gpu/lu_solve_kernel.cu | |
77 | paddle/phi/kernels/gpu/margin_cross_entropy_kernel.cu | |
78 | paddle/phi/kernels/gpu/matrix_power_grad_kernel.cu | |
79 | paddle/phi/kernels/gpu/matrix_power_kernel.cu | |
80 | paddle/phi/kernels/gpu/mean_all_grad_kernel.cu | |
81 | paddle/phi/kernels/gpu/moe_unpermute_kernel.cu | @Le-soleile |
82 | paddle/phi/kernels/gpu/momentum_kernel.cu | |
83 | paddle/phi/kernels/gpu/mp_allreduce_sum_kernel.cu | |
84 | paddle/phi/kernels/gpu/multiclass_nms3_kernel.cu | |
85 | paddle/phi/kernels/gpu/multiplex_grad_kernel.cu | |
86 | paddle/phi/kernels/gpu/nonzero_kernel.cu | |
87 | paddle/phi/kernels/gpu/pad3d_kernel.cu | |
88 | paddle/phi/kernels/gpu/partial_allgather_kernel.cu | @Le-soleile |
89 | paddle/phi/kernels/gpu/partial_concat_grad_kernel.cu | @Le-soleile |
90 | paddle/phi/kernels/gpu/partial_concat_kernel.cu | |
91 | paddle/phi/kernels/gpu/partial_recv_kernel.cu | @Le-soleile |
92 | paddle/phi/kernels/gpu/partial_send_kernel.cu | @Le-soleile |
93 | paddle/phi/kernels/gpu/psroi_pool_grad_kernel.cu | @xxiu1 |
94 | paddle/phi/kernels/gpu/quantize_linear_kernel.cu | |
95 | paddle/phi/kernels/gpu/reduce_kernel.cu | |
96 | paddle/phi/kernels/gpu/repeat_interleave_grad_kernel.cu | @SqZhang666 |
97 | paddle/phi/kernels/gpu/repeat_interleave_kernel.cu | @SqZhang666 |
98 | paddle/phi/kernels/gpu/rmsprop_kernel.cu | |
99 | paddle/phi/kernels/gpu/roi_align_grad_kernel.cu | |
100 | paddle/phi/kernels/gpu/roi_align_kernel.cu | @Le-soleile |
101 | paddle/phi/kernels/gpu/row_conv_grad_kernel.cu | @Le-soleile |
102 | paddle/phi/kernels/gpu/row_conv_kernel.cu | @Le-soleile |
103 | paddle/phi/kernels/gpu/seed_kernel.cu | @Le-soleile |
104 | paddle/phi/kernels/gpu/sequence_expand_kernel.cu | @Le-soleile |
105 | paddle/phi/kernels/gpu/set_value_kernel.cu | @Le-soleile |
106 | paddle/phi/kernels/gpu/shuffle_channel_grad_kernel.cu | @Le-soleile |
107 | paddle/phi/kernels/gpu/shuffle_channel_kernel.cu | @Le-soleile |
108 | paddle/phi/kernels/gpu/soft_relu_grad_kernel.cu | @Le-soleile |
109 | paddle/phi/kernels/gpu/spectral_norm_grad_kernel.cu | @Le-soleile |
110 | paddle/phi/kernels/gpu/spectral_norm_kernel.cu | @Le-soleile |
111 | paddle/phi/kernels/gpu/stack_grad_kernel.cu | |
112 | paddle/phi/kernels/gpu/stft_grad_kernel.cu | @Le-soleile |
113 | paddle/phi/kernels/gpu/sync_batch_norm_grad_kernel.cu | |
114 | paddle/phi/kernels/gpu/top_k_kernel.cu | |
115 | paddle/phi/kernels/gpu/uniform_random_batch_size_like_kernel.cu | @Le-soleile |
116 | paddle/phi/kernels/gpu/weighted_sample_neighbors_kernel.cu | |
117 | paddle/phi/kernels/gpu/yolo_box_head_kernel.cu | @Le-soleile |
118 | paddle/phi/kernels/gpu/yolo_box_post_kernel.cu | @Le-soleile |
119 | paddle/phi/kernels/kps/elementwise_kernel.cu | |
120 | paddle/phi/kernels/legacy/gpu/cal_aux_loss_grad_kernel.cu | @Le-soleile |
121 | paddle/phi/kernels/legacy/gpu/cal_aux_loss_kernel.cu | @Le-soleile |
122 | paddle/phi/kernels/legacy/gpu/expand_modality_expert_id_kernel.cu | @Le-soleile |
123 | paddle/phi/kernels/legacy/gpu/ext_build_src_rank_and_local_expert_id_kernel.cu | @Le-soleile |
124 | paddle/phi/kernels/legacy/gpu/fp8_quant_blockwise_kernel.cu | @Le-soleile |
125 | paddle/phi/kernels/legacy/gpu/int_bincount.cu | @junhaoguo809-crypto |
126 | paddle/phi/kernels/legacy/gpu/layer_norm_cuda_kernel.cu | @junhaoguo809-crypto |
127 | paddle/phi/kernels/legacy/gpu/moe_combine_grad_kernel.cu | @junhaoguo809-crypto |
128 | paddle/phi/kernels/legacy/gpu/moe_combine_kernel.cu | @junhaoguo809-crypto |
129 | paddle/phi/kernels/legacy/gpu/moe_combine_no_weight_kernel.cu | @junhaoguo809-crypto |
130 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_grad_kernel.cu | @junhaoguo809-crypto |
131 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_kernel.cu | |
132 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_permute_grad_kernel.cu | @Le-soleile |
133 | paddle/phi/kernels/legacy/gpu/moe_gate_dispatch_permute_kernel.cu | @Le-soleile |
134 | paddle/phi/kernels/legacy/gpu/moe_ops_partial_nosoftmaxtopk_grad_kernel.cu | @Le-soleile |
135 | paddle/phi/kernels/legacy/gpu/moe_ops_partial_nosoftmaxtopk_kernel.cu | @Le-soleile |
136 | paddle/phi/kernels/legacy/kps/compare_kernel.cu |
示例修复&代码提交方式
请参考 #75226 (comment)
认领方式
请大家以 comment 的形式认领任务,如:
【报名】:1、3、2-3
- 多个任务之间需要使用中文顿号分隔,报名多个连续任务可用横线表示,如 1-2
- PR 提交格式:
- 两个仓库分别提交 PR,Paddle 的 PR 合入后,再提交 PaddleCustomDevice 的 PR
- 两个仓库的 PR 标题均以 【CUDA Kernel No.xxx】 开头,注明任务编号
- Paddle 仓库的 PR 标题以
-part
结尾
看板信息
任务方向 | 任务数量 | 提交作品 / 任务认领 | 提交率 | 完成 | 完成率 |
---|---|---|---|---|---|
CUDA Kernel规范化 | 136 | 71 / 96 | 52.21% | 3 | 2.21% |
统计信息
排名不分先后 @SpongeBob0318 (2) @Le-soleile (1)
Metadata
Metadata
Labels
No labels
Type
Projects
Status
In Progress