[VPlan] Remove VPVectorPointer for part 0 after unrolling. #149735

fhahn · 2025-07-20T19:48:17Z

VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0.

llvmbot · 2025-07-20T19:48:53Z

@llvm/pr-subscribers-backend-systemz
@llvm/pr-subscribers-vectorizers
@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0.

Patch is 2.18 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149735.diff

289 Files Affected:

(modified) llvm/lib/Transforms/Vectorize/VPlan.h (+2)
(modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/eliminate-tail-predication.ll (+3-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-iv-select-cmp.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-factors.ll (+18-36)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+12-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/extractvalue-no-scalarization-required.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fixed-order-recurrence.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fmax-without-fast-math-flags.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fmin-without-fast-math-flags.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/fminimumnum.ll (+18-36)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/force-target-instruction-cost.ll (+5-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/gather-do-not-vectorize-addressing.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs-sve.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/induction-costs.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-allocsize-not-equal-typesize.ll (+5-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-load-store.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/interleaving-reduction.ll (+66-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/licm-calls.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll (+149-177)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/low_trip_count_predicates.ll (+11-17)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/optsize_minsize.ll (+98-126)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-chained.ll (+78-156)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-epilogue.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-mixed.ll (+16-32)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product-neon.ll (+54-108)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-dot-product.ll (+88-176)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-interleave.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-no-dotprod.ll (+6-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce-sub.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/partial-reduce.ll (+30-60)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/reduction-recurrence-costs-sve.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-fp-ext-trunc-illegal-type.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-reduction-inloop-cond.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll (+522-554)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/simple_early_exit.ll (+9-18)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/store-costs-sve.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/streaming-compatible-sve-no-maximize-bandwidth.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-inloop-reductions.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-no-remaining-iterations.ll (+6-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-reductions.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-strict-reductions.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect.ll (+12-24)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fixed-width-inorder-core.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-fneg.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions-unusual-types.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-live-out-pointer-induction.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-multiexit.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-runtime-check-size-based-threshold.ll (+8-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-forced.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-optsize.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-reductions.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-unroll.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding.ll (+16-30)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll (+15-26)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+5-40)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/synthesize-mask-for-call.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/tail-folding-styles.ll (+5-10)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-cost.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-unroll.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+33-64)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+33-35)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vector-loop-backedge-elimination-epilogue.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/vplan-printing.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/AArch64/widen-call-with-intrinsic-or-libfunc.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-gather-scatter-tailpred.ll (+10-20)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll (+7-9)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-multiexit.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/ARM/mve-reduction-types.ll (+15-30)
(modified) llvm/test/Transforms/LoopVectorize/ARM/optsize_minsize.ll (+95-120)
(modified) llvm/test/Transforms/LoopVectorize/ARM/sphinx.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-not-allowed.ll (+25-50)
(modified) llvm/test/Transforms/LoopVectorize/ARM/tail-folding-scalar-epilogue-fallback.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/LoongArch/defaults.ll (+2-3)
(modified) llvm/test/Transforms/LoopVectorize/PowerPC/exit-branch-cost.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/PowerPC/optimal-epilog-vectorization.ll (+16-32)
(modified) llvm/test/Transforms/LoopVectorize/PowerPC/small-loop-rdx.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/PowerPC/vectorize-bswap.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/bf16.ll (+11-19)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/blocks-with-dead-instructions.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/dead-ops-cost.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/defaults.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/divrem.ll (+36-54)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/evl-compatible-loops.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/f16.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/first-order-recurrence-scalable-vf1.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/fminimumnum.ll (+36-72)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/interleaved-accesses.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/lmul.ll (+8-12)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/low-trip-count.ll (+18-29)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/mask-index-type.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/ordered-reduction.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/partial-reduce-dot-product.ll (+32-64)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/pr87378-vpinstruction-or-drop-poison-generating-flags.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-unroll.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/safe-dep-distance.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll (+8-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+8-14)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/select-cmp-reduction.ll (+14-28)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/short-trip-count.ll (+6-10)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/strided-accesses.ll (+8-13)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-cost.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/truncate-to-minimal-bitwidth-evl-crash.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/type-info-cache-evl-crash.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+30-60)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-bin-unary-ops-args.ll (+36-72)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-call-intrinsics.ll (+22-44)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cast-intrinsics.ll (+21-42)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-cond-reduction.ll (+16-32)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-div.ll (+12-24)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-fixed-order-recurrence.ll (+16-32)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-inloop-reduction.ll (+38-76)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-interleave.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-intermediate-store.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-iv32.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-known-no-overflow.ll (+6-9)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-masked-loadstore.ll (+3-5)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-ordered-reduction.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reduction.ll (+38-76)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-safe-dep-distance.ll (+14-28)
(modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-vp-intrinsics.ll (+6-12)
(modified) llvm/test/Transforms/LoopVectorize/SystemZ/scalar-steps-with-users-demanding-all-lanes-and-first-lane-only.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/conversion-cost.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/divs-with-tail-folding.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll (+26-52)
(modified) llvm/test/Transforms/LoopVectorize/X86/epilog-vectorization-inductions.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/fixed-order-recurrence.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/fminimumnum.ll (+18-36)
(modified) llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll (+16-28)
(modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/imprecise-through-phis.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+10-19)
(modified) llvm/test/Transforms/LoopVectorize/X86/induction-step.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/iv-live-outs.ll (+2-3)
(modified) llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll (+15-30)
(modified) llvm/test/Transforms/LoopVectorize/X86/load-deref-pred.ll (+11-22)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked-store-cost.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/masked_load_store.ll (+120-213)
(modified) llvm/test/Transforms/LoopVectorize/X86/metadata-enable.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/optsize.ll (+10-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr34438.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr35432.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr36524.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr47437.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr51366-sunk-instruction-used-outside-of-loop.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/pr56319-vector-exit-cond-optimization-epilogue-vectorization.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/predicate-switch.ll (+43-56)
(modified) llvm/test/Transforms/LoopVectorize/X86/reduction-crash.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/reduction-fastmath.ll (+5-10)
(modified) llvm/test/Transforms/LoopVectorize/X86/scev-checks-unprofitable.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/tail_loop_folding.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/X86/transform-narrow-interleave-to-widen-memory.ll (+8-15)
(modified) llvm/test/Transforms/LoopVectorize/X86/uniform_load.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll (+12-20)
(modified) llvm/test/Transforms/LoopVectorize/X86/vectorize-force-tail-with-evl.ll (+9-18)
(modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/X86/x86-predication.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/create-induction-resume.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/dead_instructions.ll (+8-10)
(modified) llvm/test/Transforms/LoopVectorize/debugloc.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+37-74)
(modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-variable-size.ll (+15-30)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-const-TC.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/dont-fold-tail-for-divisible-TC.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/epilog-iv-select-cmp.ll (+18-24)
(modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-any-of-reductions.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-reductions.ll (+10-20)
(modified) llvm/test/Transforms/LoopVectorize/epilog-vectorization-trunc-induction-steps.ll (+4-6)
(modified) llvm/test/Transforms/LoopVectorize/expand-scev-after-invoke.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/extract-from-end-vector-constant.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-chains.ll (+24-37)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-complex.ll (+25-44)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-dead-instructions.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-multiply-recurrences.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence-scalable-vf1.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/first-order-recurrence.ll (+26-52)
(modified) llvm/test/Transforms/LoopVectorize/float-minmax-instruction-flag.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags-interleave.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/fmax-without-fast-math-flags.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/fmin-without-fast-math-flags.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/fpsat.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/if-pred-non-void.ll (+18-33)
(modified) llvm/test/Transforms/LoopVectorize/if-pred-stores.ll (+5-9)
(modified) llvm/test/Transforms/LoopVectorize/if-reduction.ll (+16-32)
(modified) llvm/test/Transforms/LoopVectorize/induction-step.ll (+10-16)
(modified) llvm/test/Transforms/LoopVectorize/induction.ll (+28-54)
(modified) llvm/test/Transforms/LoopVectorize/induction_plus.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/instruction-only-used-outside-of-loop.ll (+3-6)
(modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-different-insert-position.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/invalidate-scev-at-scope-after-vectorization.ll (+13-14)
(modified) llvm/test/Transforms/LoopVectorize/is_fpclass.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-nested-loop.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-no-wrap.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp-trunc.ll (+8-16)
(modified) llvm/test/Transforms/LoopVectorize/iv-select-cmp.ll (+26-52)
(modified) llvm/test/Transforms/LoopVectorize/iv_outside_user.ll (+7-14)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+5-10)
(modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-poison-ub-ops-feeding-pointer.ll (+2-4)
(modified) llvm/test/Transforms/LoopVectorize/load-of-struct-deref-pred.ll (+85-94)
(modified) llvm/test/Transforms/LoopVectorize/loop-form.ll (+11-22)
(modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+32-56)
(modified) llvm/test/Transforms/LoopVectorize/min-trip-count-known-via-scev.ll (+5-10)
(modified) llvm/test/Transforms/LoopVectorize/minimumnum-maximumnum-reductions.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/multiple-strides-vectorization.ll (+6-10)
(modified) llvm/test/Transforms/LoopVectorize/no-fold-tail-by-masking-iv-external-uses.ll (+1-2)
(modified) llvm/test/Transforms/LoopVectorize/no_outside_user.ll (+36-40)
(modified) llvm/test/Transforms/LoopVectorize/opaque-ptr.ll (+7-12)
(modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization-liveout.ll (+4-8)
(modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll (+20-40)

diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 204268e586b43..d80de9b960d14 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -1835,6 +1835,8 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags,
                                      getGEPNoWrapFlags(), getDebugLoc());
   }
 
+  bool isPart0() { return getUnrollPart(*this) == 0; }
+
   /// Return the cost of this VPHeaderPHIRecipe.
   InstructionCost computeCost(ElementCount VF,
                               VPCostContext &Ctx) const override {
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 2a920832f272f..666ff2354c53a 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1015,6 +1015,14 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
     if (Op->isLiveIn())
       PredPHI->replaceAllUsesWith(Op);
   }
+  if (auto *VecPtr= dyn_cast<VPVectorPointerRecipe>(&R)) {
+    if (VecPtr->getParent()->getPlan()->isUnrolled() && VecPtr->isPart0()) {
+      VecPtr->replaceAllUsesWith(VecPtr->getOperand(0));
+      return;
+    }
+  }
+
+
 
   VPValue *A;
   if (match(Def, m_Trunc(m_ZExtOrSExt(m_VPValue(A))))) {
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
index 43b942458a39e..62329b63bc5e5 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/blend-costs.ll
@@ -22,8 +22,7 @@ define void @test_blend_feeding_replicated_store_1(i64 %N, ptr noalias %src, ptr
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE30:.*]] ]
 ; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i32, ptr [[SRC]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i32, ptr [[TMP4]], i32 0
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i32>, ptr [[TMP5]], align 4
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i32>, ptr [[TMP4]], align 4
 ; CHECK-NEXT:    [[TMP6:%.*]] = icmp slt <16 x i32> [[WIDE_LOAD]], zeroinitializer
 ; CHECK-NEXT:    [[TMP7:%.*]] = select <16 x i1> [[TMP6]], <16 x i1> zeroinitializer, <16 x i1> zeroinitializer
 ; CHECK-NEXT:    [[TMP8:%.*]] = xor <16 x i1> [[TMP6]], splat (i1 true)
@@ -213,8 +212,7 @@ define void @test_blend_feeding_replicated_store_2(ptr noalias %src, ptr %dst, i
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE30:.*]] ]
 ; CHECK-NEXT:    [[GEP_SRC:%.*]] = getelementptr inbounds i8, ptr [[SRC]], i32 [[IV]]
-; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i8, ptr [[GEP_SRC]], i32 0
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[TMP2]], align 1
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <16 x i8>, ptr [[GEP_SRC]], align 1
 ; CHECK-NEXT:    [[TMP3:%.*]] = icmp eq <16 x i8> [[WIDE_LOAD]], zeroinitializer
 ; CHECK-NEXT:    [[TMP4:%.*]] = xor <16 x i1> [[TMP3]], splat (i1 true)
 ; CHECK-NEXT:    [[TMP6:%.*]] = select <16 x i1> [[TMP4]], <16 x i1> [[TMP5]], <16 x i1> zeroinitializer
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
index 8c2a48aa38695..7f36e07f924e3 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/call-costs.ll
@@ -14,18 +14,16 @@ define void @fshl_operand_first_order_recurrence(ptr %dst, ptr noalias %src) {
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[VECTOR_RECUR:%.*]] = phi <2 x i64> [ <i64 poison, i64 0>, %[[VECTOR_PH]] ], [ [[WIDE_LOAD1:%.*]], %[[VECTOR_BODY]] ]
 ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i64, ptr [[SRC]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 0
 ; CHECK-NEXT:    [[TMP5:%.*]] = getelementptr inbounds i64, ptr [[TMP2]], i32 2
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP4]], align 8
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x i64>, ptr [[TMP2]], align 8
 ; CHECK-NEXT:    [[WIDE_LOAD1]] = load <2 x i64>, ptr [[TMP5]], align 8
 ; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <2 x i64> [[VECTOR_RECUR]], <2 x i64> [[WIDE_LOAD]], <2 x i32> <i32 1, i32 2>
 ; CHECK-NEXT:    [[TMP7:%.*]] = shufflevector <2 x i64> [[WIDE_LOAD]], <2 x i64> [[WIDE_LOAD1]], <2 x i32> <i32 1, i32 2>
 ; CHECK-NEXT:    [[TMP8:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP6]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP9:%.*]] = call <2 x i64> @llvm.fshl.v2i64(<2 x i64> splat (i64 1), <2 x i64> [[TMP7]], <2 x i64> splat (i64 1))
 ; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds i64, ptr [[DST]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP12:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i32 0
 ; CHECK-NEXT:    [[TMP13:%.*]] = getelementptr inbounds i64, ptr [[TMP10]], i32 2
-; CHECK-NEXT:    store <2 x i64> [[TMP8]], ptr [[TMP12]], align 8
+; CHECK-NEXT:    store <2 x i64> [[TMP8]], ptr [[TMP10]], align 8
 ; CHECK-NEXT:    store <2 x i64> [[TMP9]], ptr [[TMP13]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; CHECK-NEXT:    [[TMP14:%.*]] = icmp eq i64 [[INDEX_NEXT]], 100
@@ -79,11 +77,9 @@ define void @powi_call(ptr %P) {
 ; CHECK:       [[VECTOR_PH]]:
 ; CHECK-NEXT:    br label %[[VECTOR_BODY:.*]]
 ; CHECK:       [[VECTOR_BODY]]:
-; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds double, ptr [[P]], i32 0
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP2]], align 8
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[P]], align 8
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <2 x double> @llvm.powi.v2f64.i32(<2 x double> [[WIDE_LOAD]], i32 3)
-; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds double, ptr [[P]], i32 0
-; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[TMP4]], align 8
+; CHECK-NEXT:    store <2 x double> [[TMP3]], ptr [[P]], align 8
 ; CHECK-NEXT:    br label %[[MIDDLE_BLOCK:.*]]
 ; CHECK:       [[MIDDLE_BLOCK]]:
 ; CHECK-NEXT:    br i1 true, label %[[EXIT:.*]], label %[[SCALAR_PH]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
index 95f3eb7b21f4e..795de3d978e74 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/clamped-trip-count.ll
@@ -33,8 +33,7 @@ define void @clamped_tc_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range(1,1
 ; CHECK-NEXT:    [[TMP10:%.*]] = shl nuw nsw <vscale x 8 x i64> [[VEC_IND]], splat (i64 3)
 ; CHECK-NEXT:    [[TMP11:%.*]] = lshr <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = trunc <vscale x 8 x i64> [[TMP11]] to <vscale x 8 x i8>
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i32 0
-; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[TMP17]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
+; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[NEXT_GEP]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 8)
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
@@ -117,8 +116,7 @@ define void @clamped_tc_max_8(ptr nocapture %dst, i32 %n, i64 %val) vscale_range
 ; CHECK-NEXT:    [[TMP10:%.*]] = shl nuw nsw <vscale x 8 x i64> [[VEC_IND]], splat (i64 3)
 ; CHECK-NEXT:    [[TMP11:%.*]] = lshr <vscale x 8 x i64> [[BROADCAST_SPLAT]], [[TMP10]]
 ; CHECK-NEXT:    [[TMP14:%.*]] = trunc <vscale x 8 x i64> [[TMP11]] to <vscale x 8 x i8>
-; CHECK-NEXT:    [[TMP17:%.*]] = getelementptr i8, ptr [[NEXT_GEP]], i32 0
-; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[TMP17]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
+; CHECK-NEXT:    call void @llvm.masked.store.nxv8i8.p0(<vscale x 8 x i8> [[TMP14]], ptr [[NEXT_GEP]], i32 1, <vscale x 8 x i1> [[ACTIVE_LANE_MASK]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP6]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 [[INDEX_NEXT]], i64 [[WIDE_TRIP_COUNT]])
 ; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <vscale x 8 x i64> [[VEC_IND]], [[DOTSPLAT]]
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
index 46a194dacad9e..9a75d797bf0b7 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/conditional-branches-cost.ll
@@ -82,9 +82,8 @@ define void @loop_dependent_cond(ptr %src, ptr noalias %dst, i64 %N) {
 ; DEFAULT:       [[VECTOR_BODY]]:
 ; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_STORE_CONTINUE7:.*]] ]
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = getelementptr double, ptr [[SRC]], i64 [[INDEX]]
-; DEFAULT-NEXT:    [[TMP5:%.*]] = getelementptr double, ptr [[TMP3]], i32 0
 ; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr double, ptr [[TMP3]], i32 2
-; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP5]], align 8
+; DEFAULT-NEXT:    [[WIDE_LOAD:%.*]] = load <2 x double>, ptr [[TMP3]], align 8
 ; DEFAULT-NEXT:    [[WIDE_LOAD1:%.*]] = load <2 x double>, ptr [[TMP6]], align 8
 ; DEFAULT-NEXT:    [[TMP7:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[WIDE_LOAD]])
 ; DEFAULT-NEXT:    [[TMP8:%.*]] = call <2 x double> @llvm.fabs.v2f64(<2 x double> [[WIDE_LOAD1]])
@@ -341,9 +340,8 @@ define void @latch_branch_cost(ptr %dst) {
 ; DEFAULT:       [[VECTOR_BODY]]:
 ; DEFAULT-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[VECTOR_BODY]] ]
 ; DEFAULT-NEXT:    [[TMP2:%.*]] = getelementptr i8, ptr [[DST]], i64 [[INDEX]]
-; DEFAULT-NEXT:    [[TMP6:%.*]] = getelementptr i8, ptr [[TMP2]], i32 0
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = getelementptr i8, ptr [[TMP2]], i32 16
-; DEFAULT-NEXT:    store <16 x i8> zeroinitializer, ptr [[TMP6]], align 1
+; DEFAULT-NEXT:    store <16 x i8> zeroinitializer, ptr [[TMP2]], align 1
 ; DEFAULT-NEXT:    store <16 x i8> zeroinitializer, ptr [[TMP5]], align 1
 ; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 32
 ; DEFAULT-NEXT:    [[TMP4:%.*]] = icmp eq i64 [[INDEX_NEXT]], 96
@@ -358,8 +356,7 @@ define void @latch_branch_cost(ptr %dst) {
 ; DEFAULT:       [[VEC_EPILOG_VECTOR_BODY]]:
 ; DEFAULT-NEXT:    [[INDEX1:%.*]] = phi i64 [ [[VEC_EPILOG_RESUME_VAL]], %[[VEC_EPILOG_PH]] ], [ [[INDEX_NEXT2:%.*]], %[[VEC_EPILOG_VECTOR_BODY]] ]
 ; DEFAULT-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[DST]], i64 [[INDEX1]]
-; DEFAULT-NEXT:    [[TMP9:%.*]] = getelementptr i8, ptr [[TMP8]], i32 0
-; DEFAULT-NEXT:    store <4 x i8> zeroinitializer, ptr [[TMP9]], align 1
+; DEFAULT-NEXT:    store <4 x i8> zeroinitializer, ptr [[TMP8]], align 1
 ; DEFAULT-NEXT:    [[INDEX_NEXT2]] = add nuw i64 [[INDEX1]], 4
 ; DEFAULT-NEXT:    [[TMP10:%.*]] = icmp eq i64 [[INDEX_NEXT2]], 100
 ; DEFAULT-NEXT:    br i1 [[TMP10]], label %[[VEC_EPILOG_MIDDLE_BLOCK:.*]], label %[[VEC_EPILOG_VECTOR_BODY]], !llvm.loop [[LOOP5:![0-9]+]]
@@ -575,8 +572,7 @@ define i32 @header_mask_and_invariant_compare(ptr %A, ptr %B, ptr %C, ptr %D, pt
 ; DEFAULT-NEXT:    store i32 [[TMP22]], ptr [[E]], align 4, !alias.scope [[META14]], !noalias [[META16]]
 ; DEFAULT-NEXT:    br label %[[PRED_STORE_CONTINUE37]]
 ; DEFAULT:       [[PRED_STORE_CONTINUE37]]:
-; DEFAULT-NEXT:    [[TMP17:%.*]] = getelementptr i32, ptr [[TMP16]], i32 0
-; DEFAULT-NEXT:    call void @llvm.masked.store.v4i32.p0(<4 x i32> zeroinitializer, ptr [[TMP17]], i32 4, <4 x i1> [[TMP8]]), !alias.scope [[META18:![0-9]+]], !noalias [[META19:![0-9]+]]
+; DEFAULT-NEXT:    call void @llvm.masked.store.v4i32.p0(<4 x i32> zeroinitializer, ptr [[TMP16]], i32 4, <4 x i1> [[TMP8]]), !alias.scope [[META18:![0-9]+]], !noalias [[META19:![0-9]+]]
 ; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
 ; DEFAULT-NEXT:    [[TMP18:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
 ; DEFAULT-NEXT:    br i1 [[TMP18]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP20:![0-9]+]]
@@ -674,8 +670,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; DEFAULT-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <8 x i16> [[BROADCAST_SPLATINSERT]], <8 x i16> poison, <8 x i32> zeroinitializer
 ; DEFAULT-NEXT:    [[TMP2:%.*]] = or <8 x i16> [[BROADCAST_SPLAT]], splat (i16 1)
 ; DEFAULT-NEXT:    [[TMP3:%.*]] = uitofp <8 x i16> [[TMP2]] to <8 x double>
-; DEFAULT-NEXT:    [[TMP4:%.*]] = getelementptr double, ptr [[NEXT_GEP]], i32 0
-; DEFAULT-NEXT:    store <8 x double> [[TMP3]], ptr [[TMP4]], align 8
+; DEFAULT-NEXT:    store <8 x double> [[TMP3]], ptr [[NEXT_GEP]], align 8
 ; DEFAULT-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 8
 ; DEFAULT-NEXT:    [[TMP5:%.*]] = icmp eq i64 [[INDEX_NEXT]], 256
 ; DEFAULT-NEXT:    br i1 [[TMP5]], label %[[MIDDLE_BLOCK:.*]], label %[[VECTOR_BODY]], !llvm.loop [[LOOP22:![0-9]+]]
@@ -730,8 +725,7 @@ define void @multiple_exit_conditions(ptr %src, ptr noalias %dst) #1 {
 ; PRED-NEXT:    [[BROADCAST_SPLAT:%.*]] = shufflevector <vscale x 2 x i16> [[BROADCAST_SPLATINSERT]], <vscale x 2 x i16> poison, <vscale x 2 x i32> zeroinitializer
 ; PRED-NEXT:    [[TMP13:%.*]] = or <vscale x 2 x i16> [[BROADCAST_SPLAT]], splat (i16 1)
 ; PRED-NEXT:    [[TMP14:%.*]] = uitofp <vscale x 2 x i16> [[TMP13]] to <vscale x 2 x double>
-; PRED-NEXT:    [[TMP15:%.*]] = getelementptr double, ptr [[NEXT_GEP]], i32 0
-; PRED-NEXT:    call void @llvm.masked.store.nxv2f64.p0(<vscale x 2 x double> [[TMP14]], ptr [[TMP15]], i32 8, <vscale x 2 x i1> [[ACTIVE_LANE_MASK]])
+; PRED-NEXT:    call void @llvm.masked.store.nxv2f64.p0(<vscale x 2 x double> [[TMP14]], ptr [[NEXT_GEP]], i32 8, <vscale x 2 x i1> [[ACTIVE_LANE_MASK]])
 ; PRED-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP5]]
 ; PRED-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[INDEX]], i64 [[TMP10]])
 ; PRED-NEXT:    [[TMP16:%.*]] = xor <vscale x 2 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
index d42be20ea1e73..1ad1e42678c5a 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll
@@ -38,11 +38,10 @@ define void @sdiv_feeding_gep(ptr %dst, i32 %x, i64 %M, i64 %conv6, i64 %N) {
 ; CHECK-NEXT:    [[TMP30:%.*]] = add i32 [[TMP28]], [[TMP26]]
 ; CHECK-NEXT:    [[TMP32:%.*]] = sext i32 [[TMP30]] to i64
 ; CHECK-NEXT:    [[TMP34:%.*]] = getelementptr double, ptr [[DST]], i64 [[TMP32]]
-; CHECK-NEXT:    [[TMP36:%.*]] = getelementptr double, ptr [[TMP34]], i32 0
 ; CHECK-NEXT:    [[TMP37:%.*]] = call i64 @llvm.vscale.i64()
 ; CHECK-NEXT:    [[TMP38:%.*]] = mul nuw i64 [[TMP37]], 2
 ; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr double, ptr [[TMP34]], i64 [[TMP38]]
-; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP36]], align 8
+; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP34]], align 8
 ; CHECK-NEXT:    store <vscale x 2 x double> zeroinitializer, ptr [[TMP39]], align 8
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP11]]
 ; CHECK-NEXT:    [[TMP40:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
@@ -149,8 +148,7 @@ define void @sdiv_feeding_gep_predicated(ptr %dst, i32 %x, i64 %M, i64 %conv6, i
 ; CHECK-NEXT:    [[TMP32:%.*]] = add i32 [[TMP31]], [[TMP30]]
 ; CHECK-NEXT:    [[TMP33:%.*]] = sext i32 [[TMP32]] to i64
 ; CHECK-NEXT:    [[TMP34:%.*]] = getelementptr double, ptr [[DST]], i64 [[TMP33]]
-; CHECK-NEXT:    [[TMP35:%.*]] = getelementptr double, ptr [[TMP34]], i32 0
-; CHECK-NEXT:    call void @llvm.masked.store.nxv2f64.p0(<vscale x 2 x double> zeroinitializer, ptr [[TMP35]], i32 8, <vscale x 2 x i1> [[TMP23]])
+; CHECK-NEXT:    call void @llvm.masked.store.nxv2f64.p0(<vscale x 2 x double> zeroinitializer, ptr [[TMP34]], i32 8, <vscale x 2 x i1> [[TMP23]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP9]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[INDEX]], i64 [[TMP14]])
 ; CHECK-NEXT:    [[TMP36:%.*]] = xor <vscale x 2 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true)
@@ -275,8 +273,7 @@ define void @udiv_urem_feeding_gep(i64 %x, ptr %dst, i64 %N) {
 ; CHECK-NEXT:    [[TMP36:%.*]] = shl i64 [[TMP35]], 32
 ; CHECK-NEXT:    [[TMP37:%.*]] = ashr i64 [[TMP36]], 32
 ; CHECK-NEXT:    [[TMP38:%.*]] = getelementptr i64, ptr [[DST]], i64 [[TMP37]]
-; CHECK-NEXT:    [[TMP39:%.*]] = getelementptr i64, ptr [[TMP38]], i32 0
-; CHECK-NEXT:    call void @llvm.masked.store.nxv2i64.p0(<vscale x 2 x i64> [[TMP23]], ptr [[TMP39]], i32 4, <vscale x 2 x i1> [[ACTIVE_LANE_MASK]])
+; CHECK-NEXT:    call void @llvm.masked.store.nxv2i64.p0(<vscale x 2 x i64> [[TMP23]], ptr [[TMP38]], i32 4, <vscale x 2 x i1> [[ACTIVE_LANE_MASK]])
 ; CHECK-NEXT:    [[INDEX_NEXT]] = add i64 [[INDEX]], [[TMP9]]
 ; CHECK-NEXT:    [[ACTIVE_LANE_MASK_NEXT]] = call <vscale x 2 x i1> @llvm.get.active.lane.mask.nxv2i1.i64(i64 [[INDEX]], i64 [[TMP14]])
 ; CHECK-NEXT:    [[TMP47:%.*]] = xor <vscale x 2 x i1> [[ACTIVE_LANE_MASK_NEXT]], splat (i1 true)
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
index e28c79eac1e5c..db9d05b7bbf29 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll
@@ -15,15 +15,13 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK:       [[VECTOR_BODY]]:
 ; CHECK-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, %[[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], %[[PRED_LOAD_CONTINUE6:.*]] ]
 ; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr inbounds double, ptr [[SRC_1]], i64 [[INDEX]]
-; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds double, ptr [[TMP1]], i32 0
-; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x double>, ptr [[TMP2]], align 8
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x double>, ptr [[TMP1]], align 8
 ; CHECK-NEXT:    [[TMP3:%.*]] = call <4 x double> @llvm.fabs.v4f64(<4 x double> [[WIDE_LOAD]])
 ; CHECK-NEXT:    [[TMP4:%.*]] = fcmp olt <4 x double> [[TMP3]], splat (double 1.000000e+00)
 ; CHECK-NEXT:    [[TMP5:%.*]] = xor <4 x i1> [[TMP4]], splat (i1 true)
 ; CHECK-NEXT:    [[TMP6:%.*]] = add i64 [[INDEX]], -1
 ; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr double, ptr [[DST_0]], i64 [[TMP6]]
-; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr double, ptr [[TMP7]], i32 0
-; CHECK-NEXT:    call void @llvm.masked.store.v4f64.p0(<4 x double> zeroinitializer, ptr [[TMP8]], i32 8, <4 x i1> [[TMP5]])
+; CHECK-NEXT:    call void @llvm.masked.store.v4f64.p0(<4 x double> zeroinitializer, ptr [[TMP7]], i32 8, <4 x i1> [[TMP5]])
 ; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <4 x i1> [[TMP4]], i32 0
 ; CHECK-NEXT:    br i1 [[TMP9]], label %[[PRED_LOAD_IF:.*]], label %[[PRED_LOAD_CONTINUE:.*]]
 ; CHECK:       [[PRED_LOAD_IF]]:
@@ -58,16 +56,14 @@ define void @check_widen_intrinsic_with_nnan(ptr noalias %dst.0, ptr noalias %ds
 ; CHECK-NEXT:    [[TMP24:%.*]] = phi <4 x double> [ [[TMP20]], %[[PRED_LOAD_CONTINUE4]] ], [ [[TMP23]], %[[PRED_LOAD_IF5]] ]
 ; CHECK-NEXT:    [[TMP25:%.*]] = add i64 [[INDEX]], -1
 ; CHECK-NEXT:    [[TMP26:%.*]] = getelementptr double, ptr [[DST_0]], i64 [[TMP25]]
-; CHECK-NEXT:    [[TMP27:%.*]] = getelementptr double, ptr [[TMP26]], i32 0
-; CHECK-NEXT:    call void @llvm.masked.store.v4f64.p0(<4 x double> zeroinitializer, ptr [[TMP27]], i32 8, <4 x i1> [[TMP4]])
+; CHECK-NEXT:    call void @llvm.masked.store...
[truncated]

github-actions · 2025-07-20T19:50:31Z

✅ With the latest revision this PR passed the C/C++ code formatter.

VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0.

lukel97 · 2025-07-21T07:11:01Z

Looks like there's a test failing in Transforms/LoopLoadElim/versioning-scev-invalidation.ll

lukel97 · 2025-07-21T07:15:24Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1015,6 +1015,12 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
    if (Op->isLiveIn())
      PredPHI->replaceAllUsesWith(Op);
  }
+  if (auto *VecPtr = dyn_cast<VPVectorPointerRecipe>(&R)) {
+    if (VecPtr->getParent()->getPlan()->isUnrolled() && VecPtr->isPart0()) {


Is it necessary to check if the plan is unrolled? I.e. can we also simplify it for UF=1?

Suggested change

if (VecPtr->getParent()->getPlan()->isUnrolled() && VecPtr->isPart0()) {

if (VecPtr->isPart0()) {

Mel-Chen · 2025-07-21T08:23:45Z

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

@@ -1015,6 +1015,12 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
    if (Op->isLiveIn())
      PredPHI->replaceAllUsesWith(Op);
  }
+  if (auto *VecPtr = dyn_cast<VPVectorPointerRecipe>(&R)) {
+    if (VecPtr->getParent()->getPlan()->isUnrolled() && VecPtr->isPart0()) {


Mel-Chen · 2025-07-21T08:24:42Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1835,6 +1835,8 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags,
                                     getGEPNoWrapFlags(), getDebugLoc());
  }

+  bool isPart0() { return getUnrollPart(*this) == 0; }


nit: isFirstPart() or isZeroPart()

Mel-Chen · 2025-07-21T08:25:22Z

llvm/lib/Transforms/Vectorize/VPlan.h

@@ -1835,6 +1835,8 @@ class VPVectorPointerRecipe : public VPRecipeWithIRFlags,
                                     getGEPNoWrapFlags(), getDebugLoc());
  }

+  bool isPart0() { return getUnrollPart(*this) == 0; }


Suggested change

bool isPart0() { return getUnrollPart(*this) == 0; }

bool isPart0() const { return getUnrollPart(*this) == 0; }

fhahn requested review from lukel97, Mel-Chen, ayalz and aniragil July 20, 2025 19:48

llvmbot added backend:RISC-V backend:PowerPC backend:SystemZ vectorizers llvm:transforms labels Jul 20, 2025

[VPlan] Remove VPVectorPointer for part 0 after unrolling.

81d97e2

VPVectorPointer for part 0 is just the pointer operand. Simplify it after unrolling. This removes a large number of redundant GEPs with index 0.

fhahn force-pushed the vplan-more-unroll-simps branch from b0f2110 to 81d97e2 Compare July 20, 2025 19:51

lukel97 reviewed Jul 21, 2025

View reviewed changes

Mel-Chen reviewed Jul 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VPlan] Remove VPVectorPointer for part 0 after unrolling. #149735

[VPlan] Remove VPVectorPointer for part 0 after unrolling. #149735

fhahn commented Jul 20, 2025

Uh oh!

llvmbot commented Jul 20, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 20, 2025 •

edited

Loading

Uh oh!

lukel97 commented Jul 21, 2025

Uh oh!

lukel97 Jul 21, 2025

Uh oh!

Mel-Chen Jul 21, 2025

Uh oh!

Mel-Chen Jul 21, 2025

Uh oh!

Mel-Chen Jul 21, 2025

Uh oh!

Mel-Chen Jul 21, 2025

Uh oh!

Uh oh!

	if (VecPtr->getParent()->getPlan()->isUnrolled() && VecPtr->isPart0()) {
	if (VecPtr->isPart0()) {

	bool isPart0() { return getUnrollPart(*this) == 0; }
	bool isPart0() const { return getUnrollPart(*this) == 0; }

[VPlan] Remove VPVectorPointer for part 0 after unrolling. #149735

Are you sure you want to change the base?

[VPlan] Remove VPVectorPointer for part 0 after unrolling. #149735

Conversation

fhahn commented Jul 20, 2025

Uh oh!

llvmbot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukel97 commented Jul 21, 2025

Uh oh!

lukel97 Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Mel-Chen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

llvmbot commented Jul 20, 2025 •

edited

Loading

github-actions bot commented Jul 20, 2025 •

edited

Loading