Skip to content

Conversation

khallouh
Copy link
Collaborator

@khallouh khallouh commented Apr 28, 2025

This allows chaining when the base pointer is used in other basic blocks but only when it is considered profitable:

  • This shouldn't happen in a loop as the resulting copy will be more costly
  • The cost of chaining is incremented for each offset falling outside the load/store immediate ranges.
  • An experimental threshold is used to determine if chaining is profitable based on the computed cost (e.g half the number of pointer adds to be chained)

Below QoR results for affected kernels, the rest are unchanged:

|---------------------------------------------|------------------------------------------|--------------|--------------|------------------------|--------------|------------------------|--------------|--------------------------------|-------------------------|------------------------------|-----------------------|--------------------------------|---------------|-----------------------|-------------------------|-------------------------|--------------|--------------|------------------------------|--------------------|-------------------------|------------------------------|---------------|---------------------------|---------------------------|---------------|-------------------------|---------------|----------------|---------------|----------------------|----------------|-------------------------------|-----------------------------------|------------------|----------------------------|------------------------------|-------------------------------|------------------------------|------------------------|-------------------------------|----------------------------|------------------|--------------------|---------------------------|-----------------------|-------------------------------|------------------------|--------------|---------------------------|---------------------------|------------------|-----------------------|------------------|------------------|------------------|------------------|---------------------------|---------------------------|---------------------------|----------------------------|------------------|------------------|-----------------------------------|------------------------|--------------|------------------|------------------|------------------------|------------------|------------------|--------------|------------|-------------|-------------|-------------|
| Core_Compute_Insn_Count                     | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GT_int32 | GemmA16W8_0  | GemmA16W4_0  | Conv1D_DW_AIE2p_int8_0 | GemmA16W4_1  | Conv1D_DW_AIE2p_int8_1 | GemmA16W8_1  | Conv2D_Transpose_bfp16_AIE2p_1 | MaxPool2dVariant_bf16_1 | Conv2D_Transpose_bf16_AIE2_1 | Conv1D_DW_AIE2_bf16_0 | Conv2D_Transpose_bfp16_AIE2p_0 | Conv2DA16W8_1 | Conv1D_DW_AIE2_bf16_1 | MaxPool2dVariant_bf16_0 | GEMM_Bf16xBfp16_1_AIE2p | GemmA16A16_1 | GemmA16A16_0 | MaxPool2dVariant_aie2_int8_0 | Conv2D_bfp16_OC8_0 | Conv2D_Transpose_int8_1 | MaxPool2dVariant_aie2_int8_1 | Conv2DA16W8_0 | Conv2D_bfp16_PSUM_FLOAT_1 | Conv2D_bfp16_PSUM_FLOAT_0 | ArgMin_bf16_1 | Conv2D_Transpose_int8_0 | ArgMax_bf16_0 | Conv2D_bfp16_0 | ArgMin_bf16_0 | Cumsum_AIE2_bfloat16 | Conv2D_bfp16_1 | InterpolateLinear1D_AIE2_int8 | InterpolateLinear1D_AIE2_bfloat16 | Conv2D_DW_bf16_1 | ReduceProdAxis_1_aie2_bf16 | LayerNormC8Part2_aie2_bf16_0 | InstanceNormPart2_aie2_bf16_0 | LayerNormC8Part2_aie2_int8_0 | Conv2D_bf16_FC_AIE2p_0 | ReduceMeanTemplated_AIE2_int8 | ReduceMeanAxis_1_aie2_bf16 | Conv2D_DW_bf16_0 | Conv2D_bfp16_OC8_1 | ReduceSumAxis_1_aie2_bf16 | Range_bfloat16_aie2_0 | InstanceNormPart2_aie2_int8_0 | Conv2D_bf16_FC_AIE2p_1 | Conv2D_DW_0  | ReduceMaxAxis_1_aie2_bf16 | ReduceMinAxis_1_aie2_bf16 | ReduceSum_int8_1 | Range_bfloat16_aie2_1 | ReduceMax_int8_1 | ReduceMax_bf16_1 | ReduceMin_int8_1 | ReduceMin_bf16_1 | ReduceMaxAxis_1_aie2_int8 | ReduceMinAxis_1_aie2_int8 | ReduceSumAxis_1_aie2_int8 | ReduceMeanAxis_1_aie2_int8 | ReduceMax_int8_0 | ReduceSum_int8_0 | ReduceMeanTemplated_AIE2_bfloat16 | ReduceProd_bf16_0_AIE2 | Conv2D_DW_1  | ReduceMax_bf16_0 | ReduceMin_bf16_0 | ReduceProd_bf16_1_AIE2 | ReduceSum_bf16_0 | ReduceMin_int8_0 | Average diff | Diff stdev | Quantile #1 | Quantile #2 | Quantile #3 |
|---------------------------------------------|------------------------------------------|--------------|--------------|------------------------|--------------|------------------------|--------------|--------------------------------|-------------------------|------------------------------|-----------------------|--------------------------------|---------------|-----------------------|-------------------------|-------------------------|--------------|--------------|------------------------------|--------------------|-------------------------|------------------------------|---------------|---------------------------|---------------------------|---------------|-------------------------|---------------|----------------|---------------|----------------------|----------------|-------------------------------|-----------------------------------|------------------|----------------------------|------------------------------|-------------------------------|------------------------------|------------------------|-------------------------------|----------------------------|------------------|--------------------|---------------------------|-----------------------|-------------------------------|------------------------|--------------|---------------------------|---------------------------|------------------|-----------------------|------------------|------------------|------------------|------------------|---------------------------|---------------------------|---------------------------|----------------------------|------------------|------------------|-----------------------------------|------------------------|--------------|------------------|------------------|------------------------|------------------|------------------|--------------|------------|-------------|-------------|-------------|
| mllib_check-vpush5_peano                    |                                      531 |         6337 |        11196 |                   1289 |          552 |                   1682 |         6337 |                           5364 |                    1417 |                         7602 |                  3899 |                           4509 |          2507 |                  4215 |                    2413 |                     610 |         1665 |         2785 |                         1453 |               6566 |                    8300 |                         2107 |         14774 |                      4292 |                      5684 |         29574 |                   44518 |         49356 |           9884 |         52428 |                16645 |          17958 |                          9967 |                             12200 |             4394 |                      21208 |                         8102 |                          7610 |                         9614 |                   6017 |                         52239 |                      19676 |             1006 |              14042 |                     13232 |                  8832 |                         12568 |                   7361 |         2632 |                      8572 |                      8572 |             8966 |                  5168 |            15329 |             7379 |            15191 |            14471 |                      7296 |                      7296 |                      6660 |                       7068 |            10796 |             7997 |                              7472 |                   2305 |          796 |             3168 |             3168 |                   6317 |             6245 |             6188 | +0.00%       |       0.00 | +0.00%      | +0.00%      | +0.00%      |
|---------------------------------------------|------------------------------------------|--------------|--------------|------------------------|--------------|------------------------|--------------|--------------------------------|-------------------------|------------------------------|-----------------------|--------------------------------|---------------|-----------------------|-------------------------|-------------------------|--------------|--------------|------------------------------|--------------------|-------------------------|------------------------------|---------------|---------------------------|---------------------------|---------------|-------------------------|---------------|----------------|---------------|----------------------|----------------|-------------------------------|-----------------------------------|------------------|----------------------------|------------------------------|-------------------------------|------------------------------|------------------------|-------------------------------|----------------------------|------------------|--------------------|---------------------------|-----------------------|-------------------------------|------------------------|--------------|---------------------------|---------------------------|------------------|-----------------------|------------------|------------------|------------------|------------------|---------------------------|---------------------------|---------------------------|----------------------------|------------------|------------------|-----------------------------------|------------------------|--------------|------------------|------------------|------------------------|------------------|------------------|--------------|------------|-------------|-------------|-------------|
| mllib_check-global-chaining-limitto10_peano |                                      538 |         6397 |        11276 |                   1298 |          555 |                   1691 |         6365 |                           5381 |                    1421 |                         7619 |                  3907 |                           4518 |          2512 |                  4223 |                    2417 |                     611 |         1667 |         2787 |                         1454 |               6570 |                    8305 |                         2108 |         14779 |                      4293 |                      5685 |         29579 |                   44525 |         49361 |           9885 |         52433 |                16646 |          17959 |                          9965 |                             12196 |             4391 |                      21192 |                         8094 |                          7602 |                         9598 |                   6004 |                         52099 |                      19620 |             1003 |              13984 |                     13176 |                  8790 |                         12508 |                   7324 |         2616 |                      8516 |                      8516 |             8898 |                  5126 |            15189 |             7311 |            15051 |            14331 |                      7224 |                      7224 |                      6588 |                       6988 |            10657 |             7858 |                              7333 |                   2262 |          780 |             3101 |             3101 |                   6178 |             6106 |             6049 | -0.09%       |       0.42 | +0.00%      | +0.00%      | +0.00%      |
|---------------------------------------------|------------------------------------------|--------------|--------------|------------------------|--------------|------------------------|--------------|--------------------------------|-------------------------|------------------------------|-----------------------|--------------------------------|---------------|-----------------------|-------------------------|-------------------------|--------------|--------------|------------------------------|--------------------|-------------------------|------------------------------|---------------|---------------------------|---------------------------|---------------|-------------------------|---------------|----------------|---------------|----------------------|----------------|-------------------------------|-----------------------------------|------------------|----------------------------|------------------------------|-------------------------------|------------------------------|------------------------|-------------------------------|----------------------------|------------------|--------------------|---------------------------|-----------------------|-------------------------------|------------------------|--------------|---------------------------|---------------------------|------------------|-----------------------|------------------|------------------|------------------|------------------|---------------------------|---------------------------|---------------------------|----------------------------|------------------|------------------|-----------------------------------|------------------------|--------------|------------------|------------------|------------------------|------------------|------------------|--------------|------------|-------------|-------------|-------------|
| Total diff                                  | REGR(+1.32%)                             | REGR(+0.95%) | REGR(+0.71%) | REGR(+0.70%)           | REGR(+0.54%) | REGR(+0.54%)           | REGR(+0.44%) | REGR(+0.32%)                   | REGR(+0.28%)            | REGR(+0.22%)                 | REGR(+0.21%)          | REGR(+0.20%)                   | REGR(+0.20%)  | REGR(+0.19%)          | REGR(+0.17%)            | REGR(+0.16%)            | REGR(+0.12%) | SAME(+0.07%) | SAME(+0.07%)                 | SAME(+0.06%)       | SAME(+0.06%)            | SAME(+0.05%)                 | SAME(+0.03%)  | SAME(+0.02%)              | SAME(+0.02%)              | SAME(+0.02%)  | SAME(+0.02%)            | SAME(+0.01%)  | SAME(+0.01%)   | SAME(+0.01%)  | SAME(+0.01%)         | SAME(+0.01%)   | SAME(-0.02%)                  | SAME(-0.03%)                      | SAME(-0.07%)     | SAME(-0.08%)               | SAME(-0.10%)                 | IMPR(-0.11%)                  | IMPR(-0.17%)                 | IMPR(-0.22%)           | IMPR(-0.27%)                  | IMPR(-0.28%)               | IMPR(-0.30%)     | IMPR(-0.41%)       | IMPR(-0.42%)              | IMPR(-0.48%)          | IMPR(-0.48%)                  | IMPR(-0.50%)           | IMPR(-0.61%) | IMPR(-0.65%)              | IMPR(-0.65%)              | IMPR(-0.76%)     | IMPR(-0.81%)          | IMPR(-0.91%)     | IMPR(-0.92%)     | IMPR(-0.92%)     | IMPR(-0.97%)     | IMPR(-0.99%)              | IMPR(-0.99%)              | IMPR(-1.08%)              | IMPR(-1.13%)               | IMPR(-1.29%)     | IMPR(-1.74%)     | IMPR(-1.86%)                      | IMPR(-1.87%)           | IMPR(-2.01%) | IMPR(-2.11%)     | IMPR(-2.11%)     | IMPR(-2.20%)           | IMPR(-2.23%)     | IMPR(-2.25%)     | -0.09%       |       0.42 | +0.00%      | +0.00%      | +0.00%      |
|---------------------------------------------|------------------------------------------|--------------|--------------|------------------------|--------------|------------------------|--------------|--------------------------------|-------------------------|------------------------------|-----------------------|--------------------------------|---------------|-----------------------|-------------------------|-------------------------|--------------|--------------|------------------------------|--------------------|-------------------------|------------------------------|---------------|---------------------------|---------------------------|---------------|-------------------------|---------------|----------------|---------------|----------------------|----------------|-------------------------------|-----------------------------------|------------------|----------------------------|------------------------------|-------------------------------|------------------------------|------------------------|-------------------------------|----------------------------|------------------|--------------------|---------------------------|-----------------------|-------------------------------|------------------------|--------------|---------------------------|---------------------------|------------------|-----------------------|------------------|------------------|------------------|------------------|---------------------------|---------------------------|---------------------------|----------------------------|------------------|------------------|-----------------------------------|------------------------|--------------|------------------|------------------|------------------------|------------------|------------------|--------------|------------|-------------|-------------|-------------|

|---------------------------------------------|--------------|--------------------------------|--------------------------------|------------------------------|-----------------------|-------------------------|-----------------------|-----------------------|---------------------|-------------------------|-------------------------|-----------------------|-----------------------|---------------|--------------------|---------------------|---------------------|---------------|----------------------|--------------|-------------------------------|------------------------------|------------------------------|--------------|-------------------------|-------------------------|--------------|--------------|------------------------|------------------------|-----------------------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|----------------------|------------------------------|------------------|-------------------------|-------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|------------------------|------------------------|--------------|--------------------------------------|----------------------------------------|---------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------|---------------------------------------------|------------------|-------------------------------|-----------------------|-----------------------|--------------------------------------|-----------------------------------------------------|----------------------------------------------|-----------------------------------------|-------------------------------------------------------|--------------|-------------------------------|------------------|------------------|--------------------|------------------------|------------------------|----------------------------|--------------------|----------------------------|---------------------------|------------------|------------------|----------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------|--------------|-----------------------|-----------------------|--------------|------------|-------------|-------------|-------------|
| Core_PMSize                                 | MaxPool2D_1  | Conv2D_Transpose_bfp16_AIE2p_0 | Conv2D_Transpose_bfp16_AIE2p_1 | Conv2D_Transpose_bf16_AIE2_1 | Conv2D_7x7s2_Layer1_1 | Conv2D_11x11s4_Layer1_0 | DegroupG4_aie2_int8_0 | DegroupG4_aie2_int8_1 | GroupG4_aie2_bf16_0 | MaxPool2dVariant_bf16_0 | MaxPool2dVariant_bf16_1 | DegroupG8_aie2_int8_0 | DegroupG8_aie2_int8_1 | Conv2DA16W8_1 | Cumsum_AIE2ps_int8 | GroupG8_aie2_bf16_0 | GroupG8_aie2_bf16_1 | Conv2DA16W8_0 | Cumsum_AIE2_bfloat16 | GemmA16W4_1  | InterpolateLinear1D_AIE2_int8 | MaxPool2dVariant_aie2_int8_0 | MaxPool2dVariant_aie2_int8_1 | GemmA16W4_0  | Conv2D_Transpose_int8_1 | Conv2D_Transpose_int8_0 | GemmA16A16_0 | GemmA16A16_1 | ReduceProd_bf16_0_AIE2 | ReduceProd_bf16_1_AIE2 | ReduceMeanTemplated_AIE2_bfloat16 | ReduceSum_bf16_0 | ReduceMax_bf16_0 | ReduceMax_bf16_1 | ReduceMin_bf16_0 | ReduceMin_bf16_1 | ReduceMax_int8_0 | ReduceMax_int8_1 | ReduceMin_int8_0 | ReduceMin_int8_1 | Expand_aie2_bfloat16 | LayerNormC8Part2_aie2_bf16_0 | Tile_aie2_bf16_0 | AvgPool2dVariant_bf16_1 | AvgPool2dVariant_bf16_0 | Transpose_aie2_int8_021 | Transpose_aie2_int8_021_pad | Transpose_aie2_int8_102 | Transpose_aie2_int8_102_pad | Transpose_aie2_int8_120 | Transpose_aie2_int8_120_pad | Transpose_aie2_int8_201 | Transpose_aie2_int8_201_pad | Transpose_aie2_int8_210 | Transpose_aie2_int8_210_pad | Transpose_aie2_bf16_021 | Transpose_aie2_bf16_021_pad | Transpose_aie2_bf16_102 | Transpose_aie2_bf16_102_pad | Transpose_aie2_bf16_120 | Transpose_aie2_bf16_120_pad | Transpose_aie2_bf16_201 | Transpose_aie2_bf16_201_pad | Transpose_aie2_bf16_210 | Transpose_aie2_bf16_210_pad | Conv1D_DW_AIE2p_int8_0 | Conv1D_DW_AIE2p_int8_1 | Slice_int8_0 | CompareOpsAttributeBroadcasting_bf16 | CompareOpsAttributeBroadcasting_bf16_1 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16_1 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_bfloat16 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_bfloat16 | Slice_bfloat16_0 | InstanceNormPart2_aie2_bf16_0 | Conv1D_DW_AIE2_bf16_0 | Conv1D_DW_AIE2_bf16_1 | CompareOpsAttributeBroadcasting_int8 | CompareOpsBroadcasting_K_EQ_GE_GT_LE_LT_CMP_GE_int8 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_EQ_int8_aie2 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8 | CompareOps_K_EQ_GE_GT_LE_LT_CMP_GE_int8_ptr_interface | GemmA16W8_1  | ReduceMeanTemplated_AIE2_int8 | ReduceSum_int8_0 | ReduceSum_int8_1 | Conv2D_bfp16_OC8_0 | Conv2D_bf16_FC_AIE2p_1 | Conv2D_bf16_FC_AIE2p_0 | ReduceProdAxis_1_aie2_bf16 | Conv2D_bfp16_OC8_1 | ReduceMeanAxis_1_aie2_bf16 | ReduceSumAxis_1_aie2_bf16 | Conv2D_DW_bf16_0 | Conv2D_DW_bf16_1 | ReduceMeanAxis_1_aie2_int8 | ReduceMaxAxis_1_aie2_bf16 | ReduceMinAxis_1_aie2_bf16 | ReduceSumAxis_1_aie2_int8 | ReduceMaxAxis_1_aie2_int8 | ReduceMinAxis_1_aie2_int8 | Conv2D_DW_0  | Conv2D_DW_1  | Range_bfloat16_aie2_0 | Range_bfloat16_aie2_1 | Average diff | Diff stdev | Quantile #1 | Quantile #2 | Quantile #3 |
|---------------------------------------------|--------------|--------------------------------|--------------------------------|------------------------------|-----------------------|-------------------------|-----------------------|-----------------------|---------------------|-------------------------|-------------------------|-----------------------|-----------------------|---------------|--------------------|---------------------|---------------------|---------------|----------------------|--------------|-------------------------------|------------------------------|------------------------------|--------------|-------------------------|-------------------------|--------------|--------------|------------------------|------------------------|-----------------------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|----------------------|------------------------------|------------------|-------------------------|-------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|------------------------|------------------------|--------------|--------------------------------------|----------------------------------------|---------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------|---------------------------------------------|------------------|-------------------------------|-----------------------|-----------------------|--------------------------------------|-----------------------------------------------------|----------------------------------------------|-----------------------------------------|-------------------------------------------------------|--------------|-------------------------------|------------------|------------------|--------------------|------------------------|------------------------|----------------------------|--------------------|----------------------------|---------------------------|------------------|------------------|----------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------|--------------|-----------------------|-----------------------|--------------|------------|-------------|-------------|-------------|
| mllib_check-vpush5_peano                    |         2564 |                           5684 |                           5828 |                         5972 |                  6180 |                    6180 |                  2740 |                  2740 |                2772 |                    2772 |                    2788 |                  2852 |                  2852 |          5732 |               2948 |                2996 |                2996 |          6852 |                 3460 |         3636 |                          3652 |                         3956 |                         3956 |         6404 |                    9060 |                    9236 |         5972 |         5972 |                   5556 |                   5556 |                              4708 |             4516 |             4276 |             4276 |             4276 |             4276 |             4260 |             4260 |             4260 |             4260 |                 4164 |                         4164 |             4164 |                    4148 |                    4132 |                    4100 |                        4100 |                    4100 |                        4100 |                    4100 |                        4100 |                    4100 |                        4100 |                    4100 |                        4100 |                    4068 |                        4068 |                    4068 |                        4068 |                    4068 |                        4068 |                    4068 |                        4068 |                    4068 |                        4068 |                   3332 |                   3332 |         3108 |                                 3060 |                                   3060 |                                                    3060 |                                                      3060 |                                        3060 |                                        3060 |             3012 |                          2948 |                  2932 |                  2932 |                                 2804 |                                                2804 |                                         2804 |                                    2804 |                                                  2804 |         7988 |                          4660 |             4580 |             4580 |               6948 |                   5732 |                   5620 |                       8340 |               7364 |                       6724 |                      6724 |             3252 |             3252 |                       6292 |                      6180 |                      6180 |                      6180 |                      6148 |                      6148 |         3860 |         3860 |                  3620 |                  3620 | +0.00%       |       0.00 | +0.00%      | +0.00%      | +0.00%      |
|---------------------------------------------|--------------|--------------------------------|--------------------------------|------------------------------|-----------------------|-------------------------|-----------------------|-----------------------|---------------------|-------------------------|-------------------------|-----------------------|-----------------------|---------------|--------------------|---------------------|---------------------|---------------|----------------------|--------------|-------------------------------|------------------------------|------------------------------|--------------|-------------------------|-------------------------|--------------|--------------|------------------------|------------------------|-----------------------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|----------------------|------------------------------|------------------|-------------------------|-------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|------------------------|------------------------|--------------|--------------------------------------|----------------------------------------|---------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------|---------------------------------------------|------------------|-------------------------------|-----------------------|-----------------------|--------------------------------------|-----------------------------------------------------|----------------------------------------------|-----------------------------------------|-------------------------------------------------------|--------------|-------------------------------|------------------|------------------|--------------------|------------------------|------------------------|----------------------------|--------------------|----------------------------|---------------------------|------------------|------------------|----------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------|--------------|-----------------------|-----------------------|--------------|------------|-------------|-------------|-------------|
| mllib_check-global-chaining-limitto10_peano |         2612 |                           5748 |                           5892 |                         6036 |                  6244 |                    6244 |                  2756 |                  2756 |                2788 |                    2788 |                    2804 |                  2868 |                  2868 |          5764 |               2964 |                3012 |                3012 |          6884 |                 3476 |         3652 |                          3668 |                         3972 |                         3972 |         6420 |                    9076 |                    9252 |         5956 |         5956 |                   5540 |                   5540 |                              4692 |             4500 |             4260 |             4260 |             4260 |             4260 |             4244 |             4244 |             4244 |             4244 |                 4148 |                         4148 |             4148 |                    4132 |                    4116 |                    4084 |                        4084 |                    4084 |                        4084 |                    4084 |                        4084 |                    4084 |                        4084 |                    4084 |                        4084 |                    4052 |                        4052 |                    4052 |                        4052 |                    4052 |                        4052 |                    4052 |                        4052 |                    4052 |                        4052 |                   3316 |                   3316 |         3092 |                                 3044 |                                   3044 |                                                    3044 |                                                      3044 |                                        3044 |                                        3044 |             2996 |                          2932 |                  2916 |                  2916 |                                 2788 |                                                2788 |                                         2788 |                                    2788 |                                                  2788 |         7940 |                          4628 |             4548 |             4548 |               6884 |                   5668 |                   5556 |                       8244 |               7268 |                       6628 |                      6628 |             3204 |             3204 |                       6196 |                      6068 |                      6068 |                      6068 |                      6036 |                      6036 |         3780 |         3780 |                  3540 |                  3540 | -0.13%       |       0.49 | -0.29%      | +0.00%      | +0.00%      |
|---------------------------------------------|--------------|--------------------------------|--------------------------------|------------------------------|-----------------------|-------------------------|-----------------------|-----------------------|---------------------|-------------------------|-------------------------|-----------------------|-----------------------|---------------|--------------------|---------------------|---------------------|---------------|----------------------|--------------|-------------------------------|------------------------------|------------------------------|--------------|-------------------------|-------------------------|--------------|--------------|------------------------|------------------------|-----------------------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|----------------------|------------------------------|------------------|-------------------------|-------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|------------------------|------------------------|--------------|--------------------------------------|----------------------------------------|---------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------|---------------------------------------------|------------------|-------------------------------|-----------------------|-----------------------|--------------------------------------|-----------------------------------------------------|----------------------------------------------|-----------------------------------------|-------------------------------------------------------|--------------|-------------------------------|------------------|------------------|--------------------|------------------------|------------------------|----------------------------|--------------------|----------------------------|---------------------------|------------------|------------------|----------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------|--------------|-----------------------|-----------------------|--------------|------------|-------------|-------------|-------------|
| Total diff                                  | REGR(+1.87%) | REGR(+1.13%)                   | REGR(+1.10%)                   | REGR(+1.07%)                 | REGR(+1.04%)          | REGR(+1.04%)            | REGR(+0.58%)          | REGR(+0.58%)          | REGR(+0.58%)        | REGR(+0.58%)            | REGR(+0.57%)            | REGR(+0.56%)          | REGR(+0.56%)          | REGR(+0.56%)  | REGR(+0.54%)       | REGR(+0.53%)        | REGR(+0.53%)        | REGR(+0.47%)  | REGR(+0.46%)         | REGR(+0.44%) | REGR(+0.44%)                  | REGR(+0.40%)                 | REGR(+0.40%)                 | REGR(+0.25%) | REGR(+0.18%)            | REGR(+0.17%)            | IMPR(-0.27%) | IMPR(-0.27%) | IMPR(-0.29%)           | IMPR(-0.29%)           | IMPR(-0.34%)                      | IMPR(-0.35%)     | IMPR(-0.37%)     | IMPR(-0.37%)     | IMPR(-0.37%)     | IMPR(-0.37%)     | IMPR(-0.38%)     | IMPR(-0.38%)     | IMPR(-0.38%)     | IMPR(-0.38%)     | IMPR(-0.38%)         | IMPR(-0.38%)                 | IMPR(-0.38%)     | IMPR(-0.39%)            | IMPR(-0.39%)            | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.39%)            | IMPR(-0.39%)                | IMPR(-0.48%)           | IMPR(-0.48%)           | IMPR(-0.51%) | IMPR(-0.52%)                         | IMPR(-0.52%)                           | IMPR(-0.52%)                                            | IMPR(-0.52%)                                              | IMPR(-0.52%)                                | IMPR(-0.52%)                                | IMPR(-0.53%)     | IMPR(-0.54%)                  | IMPR(-0.55%)          | IMPR(-0.55%)          | IMPR(-0.57%)                         | IMPR(-0.57%)                                        | IMPR(-0.57%)                                 | IMPR(-0.57%)                            | IMPR(-0.57%)                                          | IMPR(-0.60%) | IMPR(-0.69%)                  | IMPR(-0.70%)     | IMPR(-0.70%)     | IMPR(-0.92%)       | IMPR(-1.12%)           | IMPR(-1.14%)           | IMPR(-1.15%)               | IMPR(-1.30%)       | IMPR(-1.43%)               | IMPR(-1.43%)              | IMPR(-1.48%)     | IMPR(-1.48%)     | IMPR(-1.53%)               | IMPR(-1.81%)              | IMPR(-1.81%)              | IMPR(-1.81%)              | IMPR(-1.82%)              | IMPR(-1.82%)              | IMPR(-2.07%) | IMPR(-2.07%) | IMPR(-2.21%)          | IMPR(-2.21%)          | -0.13%       |       0.49 | -0.29%      | +0.00%      | +0.00%      |
|---------------------------------------------|--------------|--------------------------------|--------------------------------|------------------------------|-----------------------|-------------------------|-----------------------|-----------------------|---------------------|-------------------------|-------------------------|-----------------------|-----------------------|---------------|--------------------|---------------------|---------------------|---------------|----------------------|--------------|-------------------------------|------------------------------|------------------------------|--------------|-------------------------|-------------------------|--------------|--------------|------------------------|------------------------|-----------------------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|------------------|----------------------|------------------------------|------------------|-------------------------|-------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|-------------------------|-----------------------------|------------------------|------------------------|--------------|--------------------------------------|----------------------------------------|---------------------------------------------------------|-----------------------------------------------------------|---------------------------------------------|---------------------------------------------|------------------|-------------------------------|-----------------------|-----------------------|--------------------------------------|-----------------------------------------------------|----------------------------------------------|-----------------------------------------|-------------------------------------------------------|--------------|-------------------------------|------------------|------------------|--------------------|------------------------|------------------------|----------------------------|--------------------|----------------------------|---------------------------|------------------|------------------|----------------------------|---------------------------|---------------------------|---------------------------|---------------------------|---------------------------|--------------|--------------|-----------------------|-----------------------|--------------|------------|-------------|-------------|-------------|

Stack Size is unchanged in all kernels

…ristic

This allows chaining when the base pointer is used in other basic blocks
but only when it is considered profitable:
- This shouldn't happen in a loop as the resulting copy will be more costly
- The cost of chaining is incremented for each offset falling outside the
load/store immediate ranges.
- An experimental threshold is used to determine if chaining is
  profitable based on the compute cost (e.g half the number of pointer
adds to be chained)
@khallouh khallouh force-pushed the hamza.cluster.base.addr branch from 50a42df to 5230bcf Compare April 28, 2025 15:00
return true;
} else {
if (MemType.getSizeInBits() <= 32) {
ImmediateRangeMax = TII->getLoadStorePostIncImmediateRange(MemType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: one query to TII is enough.

// assume chaining is always profitable.
if (MemType.isVector()) {
return true;
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove else, it will reduce the indentation.


// If the immediate range is not set, the pointers aren't used by any
// loads and stores, so we return.
if (!ImmediateRangeSet) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dead code? This covered by:

llvm_unreachable(
  "unreachable: Unsupported immediate range of scalar size ");

Should we reuse that else to just return false?

continue;
}

const int64_t CurrOffset = OffsetMI->Value.getSExtValue();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could include some description of the idea (also as a future reference).

// assume chaining is always profitable.
if (MemType.isVector()) {
return true;
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can assert MemType.getSizeInBits() <= 32 and remove the next if as well.

int64_t ImmediateRangeMax;
int64_t ImmediateRangeMin;
};
virtual ImmediateRangeBounds
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simply use std::pair?

@@ -72,6 +75,11 @@ static cl::opt<bool> EnableChainsForVectorLdSt(
"aie-chain-addr-vec-ldst", cl::Hidden, cl::init(true),
cl::desc("Enable ptradd chaining for vector loads and stores."));

cl::opt<int> AddressChainCostLimit(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we make this static?

// If the base reg is used in any of the successive MBBs, would introduce a
// COPY and increase reg pressure. We only skip chaining in this case if it
// is considered unprofitable.
if (isRegUsedInSuccessiveMBBs(&MBB, PtrReg) &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move isRegUsedInSuccessiveMBBs inside profitability check?

ChainedCostLimit = AddressChainCostLimit;
}

if (isRegUsedInSuccessiveMBBs(&MBB, PtrReg)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are checking isRegUsedInSuccessiveMBBs already right?

assert(Instrs.size() > 1);

bool InLoop = true;
MachineLoopInfo &MLI = getAnalysis<MachineLoopInfo>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use AIELoopUtils::isSingleMBBLoop, then no need to include MachineLoopInfo

return false;
}

auto OffsetMI =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: OffsetValue may be.

# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
#
# (c) Copyright 2025 Advanced Micro Devices, Inc. or its affiliates
# RUN: llc -mtriple aie2p -start-before=aie-cluster-base-address -stop-after=postmisched --issue-limit=6 --aie-chain-addr-scl-ldst=true %s -verify-machineinstrs -o - | FileCheck %s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arent --issue-limit=6 --aie-chain-addr-scl-ldst=true the default options?

Offsets.push_back(
(I == 0 || (std::holds_alternative<std::string>(Offsets.back()) &&
std::get<std::string>(Offsets.back()) == "Break"))
? CurrOffset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like CurrOffset is an integer value.


NewOffset = NextOffset - AccumulatedOffset;

if (NewOffset < ImmediateRangeMin || NewOffset > ImmediateRangeMax) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check: are we supposed to check AccumulatedOffset with immediate range right?

return {14, -16};
else if (MemType.getSizeInBits() <= 32)
return {28, -32};
else
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have memory type for s128 as well right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gracefully exit, rather than asserting?

@@ -1650,3 +1650,15 @@ unsigned AIE2InstrInfo::getBasicVectorBitSize() const { return 512; }
unsigned AIE2InstrInfo::getMaxVectorBitSize() const { return 1024; }

unsigned AIE2InstrInfo::getMaxSupportedLdStIncSize() const { return 512; }

AIEBaseInstrInfo::ImmediateRangeBounds
AIE2InstrInfo::getLoadStorePostIncImmediateRange(LLT MemType) const {
Copy link
Collaborator

@F-Stuckmann F-Stuckmann May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will open up a PR, where i have the Immediate Range checks for all the MemorySizes.
You are missing the vector Sizes checks and the checks for different Loads, such as G_ZEXT_LOAD, which are pitiful small.

mgehre-amd pushed a commit that referenced this pull request Aug 21, 2025
[AutoBump] Merge with fixes of 0a17bdf (Oct 15) (14) (Needs ONNX Bump)(Needs downstream changes)(Needs Torch Bump)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants