[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering #151175

yangtetris · 2025-07-29T15:48:39Z

This patch extends the VectorFromElementsLowering conversion pattern to support
vectors of any rank, removing the previous restriction to 0D/1D vectors only.

Implementation Details:

0D vectors: Handled explicitly since LLVMTypeConverter converts them to
length-1 1D vectors
1D vectors: Direct construction using llvm.insertelement operations
N-D vectors: Two-phase construction:
- Build 1D vectors for the innermost dimension using llvm.insertelement
- Assemble them into the nested aggregate structure using llvm.insertvalue
  and nDVectorIterate
Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency

Example:

// Before: Failed for rank > 1
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>

// After: Converts to nested aggregate
%poison = llvm.mlir.poison : !llvm.array<2 x vector<2xf32>>
%inner0 = llvm.insertelement %e0, %poison_1d[%c0] : vector<2xf32>
%inner0 = llvm.insertelement %e1, %inner0[%c1] : vector<2xf32>
%inner1 = llvm.insertelement %e2, %poison_1d[%c0] : vector<2xf32>
%inner1 = llvm.insertelement %e3, %inner1[%c1] : vector<2xf32>
%result = llvm.insertvalue %inner0, %poison[0] : !llvm.array<2 x vector<2xf32>>
%result = llvm.insertvalue %inner1, %result[1] : !llvm.array<2 x vector<2xf32>>

llvmbot · 2025-07-29T15:49:15Z

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Yang Bai (yangtetris)

Changes

This patch extends the VectorFromElementsLowering conversion pattern to support
vectors of any rank, removing the previous restriction to 1D vectors only.

Implementation Details:

0D vectors: Handled explicitly since LLVMTypeConverter converts them to
length-1 1D vectors
1D vectors: Direct construction using llvm.insertelement operations
N-D vectors: Two-phase construction:
- Build 1D vectors for the innermost dimension using llvm.insertelement
- Assemble them into the nested aggregate structure using llvm.insertvalue
  and nDVectorIterate
Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency

Example:

// Before: Failed for rank &gt; 1
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector&lt;2x2xf32&gt;

// After: Converts to nested aggregate
%poison = llvm.mlir.poison : !llvm.array&lt;2 x vector&lt;2xf32&gt;&gt;
%inner0 = llvm.insertelement %e0, %poison_1d[%c0] : vector&lt;2xf32&gt;
%inner0 = llvm.insertelement %e1, %inner0[%c1] : vector&lt;2xf32&gt;
%inner1 = llvm.insertelement %e2, %poison_1d[%c0] : vector&lt;2xf32&gt;
%inner1 = llvm.insertelement %e3, %inner1[%c1] : vector&lt;2xf32&gt;
%result = llvm.insertvalue %inner0, %poison[0] : !llvm.array&lt;2 x vector&lt;2xf32&gt;&gt;
%result = llvm.insertvalue %inner1, %result[1] : !llvm.array&lt;2 x vector&lt;2xf32&gt;&gt;

Full diff: https://github.com/llvm/llvm-project/pull/151175.diff

2 Files Affected:

(modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp (+55-8)
(modified) mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir (+24)

diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 17a79e3815b97..26d056cadb19c 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1890,15 +1890,62 @@ struct VectorFromElementsLowering
                   ConversionPatternRewriter &rewriter) const override {
     Location loc = fromElementsOp.getLoc();
     VectorType vectorType = fromElementsOp.getType();
-    // TODO: Multi-dimensional vectors lower to !llvm.array<... x vector<>>.
-    // Such ops should be handled in the same way as vector.insert.
-    if (vectorType.getRank() > 1)
-      return rewriter.notifyMatchFailure(fromElementsOp,
-                                         "rank > 1 vectors are not supported");
     Type llvmType = typeConverter->convertType(vectorType);
-    Value result = LLVM::PoisonOp::create(rewriter, loc, llvmType);
-    for (auto [idx, val] : llvm::enumerate(adaptor.getElements()))
-      result = vector::InsertOp::create(rewriter, loc, val, result, idx);
+    Type llvmIndexType = typeConverter->convertType(rewriter.getIndexType());
+
+    Value result;
+    // 0D vectors are converted to legnth-1 1D vectors by LLVMTypeConverter.
+    if (vectorType.getRank() == 0) {
+      result = LLVM::PoisonOp::create(rewriter, loc, llvmType);
+      auto index0 = LLVM::ConstantOp::create(rewriter, loc, llvmIndexType, 0);
+      result = LLVM::InsertElementOp::create(rewriter, loc, result, adaptor.getElements().front(), index0);
+      rewriter.replaceOp(fromElementsOp, result);
+      return success();
+    }
+    
+    // Build 1D vectors for the innermost dimension
+    int64_t innerDimSize = vectorType.getShape().back();
+    int64_t numInnerVectors = vectorType.getNumElements() / innerDimSize;
+
+    SmallVector<Value> innerVectors;
+    innerVectors.reserve(numInnerVectors);
+
+    auto innerVectorType = VectorType::get(innerDimSize, vectorType.getElementType());
+    Type llvmInnerType = typeConverter->convertType(innerVectorType);
+
+    int64_t elementInVectorIdx = 0;
+    Value innerVector;
+    for (auto val : adaptor.getElements()) {
+      if (elementInVectorIdx == 0)
+        innerVector = LLVM::PoisonOp::create(rewriter, loc, llvmInnerType);
+      auto position = LLVM::ConstantOp::create(rewriter, loc, llvmIndexType, elementInVectorIdx);
+      innerVector = LLVM::InsertElementOp::create(rewriter, loc, llvmInnerType, innerVector, val, position);
+      if (++elementInVectorIdx == innerDimSize) {
+        innerVectors.push_back(innerVector);
+        elementInVectorIdx = 0;
+      }
+    }
+
+    // For 1D vectors, we can just return the first innermost vector.
+    if (vectorType.getRank() == 1) {
+      rewriter.replaceOp(fromElementsOp, innerVectors.front());
+      return success();
+    }
+
+    // Now build the nested aggregate structure from these 1D vectors.
+    result = LLVM::PoisonOp::create(rewriter, loc, llvmType);
+    
+    // Use the same iteration approach as VectorBroadcastScalarToNdLowering to
+    // insert the 1D vectors into the aggregate.
+    auto vectorTypeInfo = LLVM::detail::extractNDVectorTypeInfo(vectorType, *getTypeConverter());
+    if (!vectorTypeInfo.llvmNDVectorTy)
+      return failure();
+    int64_t vectorIdx = 0;
+    nDVectorIterate(vectorTypeInfo, rewriter, [&](ArrayRef<int64_t> position) {
+      result = LLVM::InsertValueOp::create(rewriter, loc, result, 
+                                           innerVectors[vectorIdx++], position);
+    });
+    
     rewriter.replaceOp(fromElementsOp, result);
     return success();
   }
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
index 31e17fb3e3cc6..834858c0b7c8f 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
@@ -2286,6 +2286,30 @@ func.func @from_elements_0d(%arg0: f32) -> vector<f32> {
 
 // -----
 
+// CHECK-LABEL: func.func @from_elements_3d(
+//  CHECK-SAME:     %[[ARG_0:.*]]: f32, %[[ARG_1:.*]]: f32, %[[ARG_2:.*]]: f32, %[[ARG_3:.*]]: f32)
+//       CHECK:   %[[UNDEF_VEC0:.*]] = llvm.mlir.poison : vector<2xf32>
+//       CHECK:   %[[C0_0:.*]] = llvm.mlir.constant(0 : i64) : i64
+//       CHECK:   %[[VEC0_0:.*]] = llvm.insertelement %[[ARG_0]], %[[UNDEF_VEC0]][%[[C0_0]] : i64] : vector<2xf32>
+//       CHECK:   %[[C1_0:.*]] = llvm.mlir.constant(1 : i64) : i64
+//       CHECK:   %[[VEC0_1:.*]] = llvm.insertelement %[[ARG_1]], %[[VEC0_0]][%[[C1_0]] : i64] : vector<2xf32>
+//       CHECK:   %[[UNDEF_VEC1:.*]] = llvm.mlir.poison : vector<2xf32>
+//       CHECK:   %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
+//       CHECK:   %[[VEC1_0:.*]] = llvm.insertelement %[[ARG_2]], %[[UNDEF_VEC1]][%[[C0_1]] : i64] : vector<2xf32>
+//       CHECK:   %[[C1_1:.*]] = llvm.mlir.constant(1 : i64) : i64
+//       CHECK:   %[[VEC1_1:.*]] = llvm.insertelement %[[ARG_3]], %[[VEC1_0]][%[[C1_1]] : i64] : vector<2xf32>
+//       CHECK:   %[[UNDEF_RES:.*]] = llvm.mlir.poison : !llvm.array<2 x array<1 x vector<2xf32>>>
+//       CHECK:   %[[RES_0:.*]] = llvm.insertvalue %[[VEC0_1]], %[[UNDEF_RES]][0, 0] : !llvm.array<2 x array<1 x vector<2xf32>>>
+//       CHECK:   %[[RES_1:.*]] = llvm.insertvalue %[[VEC1_1]], %[[RES_0]][1, 0] : !llvm.array<2 x array<1 x vector<2xf32>>>
+//       CHECK:   %[[CAST:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x array<1 x vector<2xf32>>> to vector<2x1x2xf32>
+//       CHECK:   return %[[CAST]]
+func.func @from_elements_3d(%arg0: f32, %arg1: f32, %arg2: f32, %arg3: f32) -> vector<2x1x2xf32> {
+  %0 = vector.from_elements %arg0, %arg1, %arg2, %arg3 : vector<2x1x2xf32>
+  return %0 : vector<2x1x2xf32>
+}
+
+// -----
+
 //===----------------------------------------------------------------------===//
 // vector.to_elements
 //===----------------------------------------------------------------------===//

github-actions · 2025-07-29T15:51:38Z

✅ With the latest revision this PR passed the C/C++ code formatter.

dcaballe

LGTM, just minor comments. Feel free to address them before landing. Thanks!

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

dcaballe · 2025-07-30T18:34:46Z

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

+    // Use the same iteration approach as VectorBroadcastScalarToNdLowering to
+    // insert the 1D vectors into the aggregate.
+    auto vectorTypeInfo =
+        LLVM::detail::extractNDVectorTypeInfo(vectorType, *getTypeConverter());
+    if (!vectorTypeInfo.llvmNDVectorTy)
+      return failure();
+    int64_t vectorIdx = 0;
+    nDVectorIterate(vectorTypeInfo, rewriter, [&](ArrayRef<int64_t> position) {
+      result = LLVM::InsertValueOp::create(rewriter, loc, result,
+                                           innerVectors[vectorIdx++], position);


Is there a change to refactor this code for both cases? This sounds like a common pattern that other ops might need as well...

Yeah. Other vector ops might also use this pattern. I just added a new overload to nDVectorIterate which accepts a VectorType and internally calls extractNDVectorTypeInfo. But I didn't change the usage in VectorBroadcastScalarToNdLowering, because it needs to do some things first that depend on extractNDVectorTypeInfo before it can execute nDVectorIterate.

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp

Co-authored-by: Nicolas Vasilache <[email protected]>

banach-space

Thanks, LGTM

I really appreciate your clear PR summaries, thank you!

[nit] %poison = llvm.mlir.poison : !llvm.array<2 x vector<2xf32>> -> %poison_1d = llvm.mlir.poison : !llvm.array<2 x vector<2xf32>>`?

newling · 2025-08-06T15:59:43Z

What about having a pattern in the lowering that insert a shape_cast (N-D -> 1-D) before, will the resulting IR be worse? If so does that mean shape_cast lowering needs improvement?

Groverkss

I don't think this is how it should be done for N-D vectors. This should be done the same way as other vector ops do, by seperating the unrolling transformation from the conversion.

Most LLVM conversions only support 1-D/0-D vectors. And there are seperate transformations which unroll.

Groverkss · 2025-08-06T16:14:46Z

This should be implemented the same way #132227 is implemented. We should never do unrolling like this in conversion. It should always be a seperate pattern.

dcaballe · 2025-08-06T21:35:22Z

What about having a pattern in the lowering that insert a shape_cast (N-D -> 1-D) before, will the resulting IR be worse? If so does that mean shape_cast lowering needs improvement?

IIRC, there is no lowering for shape cast ops in this set of patterns so that would create a dependency with the independent lowering of shape cast ops. This definitely needs some work as I'm not sure keeping them separate makes sense anymore. Something we should address separately.

This should be implemented the same way #132227 is implemented.

This makes sense to me. We may want to revisit this for ops that are already do it. My understanding is that this is mostly inspired by the existing conversion of vector.insert.

yangtetris · 2025-08-07T16:59:42Z

This should be implemented the same way #132227 is implemented. We should never do unrolling like this in conversion. It should always be a seperate pattern.

Thank you for your feedback. Decoupling the unrolling transformation and conversion seems like a good idea. However, I'm wondering if the conversion pattern only supports 1-D vectors, then how would we support converting multi-dim vectors in VectorToLLVMDialectInterface? There is no transformation stage in the convert-to-llvm pass. Or, is it okay to not support from_elements with multi-dim vectors in the convert-to-llvm?

dcaballe · 2025-08-07T17:19:51Z

Or, is it okay to not support from_elements with multi-dim vectors in the convert-to-llvm?

Yes, it makes sense that some independent prep work is required before the actual conversion to keep the conversion simpler so just bailing out for multi-dim vectors should be ok.

newling · 2025-08-07T17:30:10Z

The idea is to have a vector-to-vector lowering here, and to keep the actual vector-to-llvm conversion simple and restricted to already 'lowered' vector ops.

So I guess the request is to implement a function populateVectorFromElementsOpLoweringPatterns. Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

banach-space · 2025-08-07T20:10:17Z

The idea is to have a vector-to-vector lowering here, and to keep the actual vector-to-llvm conversion simple and restricted to already 'lowered' vector ops.

Keeping vector-to-llvm simple makes a lot of sense. Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

yangtetris · 2025-08-08T08:24:28Z

IIUC, we now have two ways to convert vector to LLVM:

convert-vector-to-llvm pass. It includes two stages: a transformation stage that "lowers vector ops to other vector ops", and a conversion that "lowers vector ops to LLVM ops".
convert-to-llvm: only includes the latter conversion stage.

We only need to support lowering multi-dim from_elements ops in convert-vector-to-llvm. Please correct me if I'm wrong.

Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

Our goal is keeping the conversion stage simple, right? So it is ok to shift the complexity to populateVectorShapeCastLoweringPatterns or populateVectorFromElementsOpLoweringPatterns, which both belong to the transformation stage.

Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

Using shapecast or iteratively unrolling N-D vectors are both ok for me. I'd like to do some experiments to study which method can generate more efficient operations.

newling · 2025-08-08T14:13:19Z

IIUC, we now have two ways to convert vector to LLVM:

convert-vector-to-llvm pass. It includes two stages: a transformation stage that "lowers vector ops to other vector ops", and a conversion that "lowers vector ops to LLVM ops".

convert-to-llvm: only includes the latter conversion stage.

We only need to support lowering multi-dim from_elements ops in convert-vector-to-llvm. Please correct me if I'm wrong.

Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

Our goal is keeping the conversion stage simple, right? So it is ok to shift the complexity to populateVectorShapeCastLoweringPatterns or populateVectorFromElementsOpLoweringPatterns, which both belong to the transformation stage.

Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

Using shapecast or iteratively unrolling N-D vectors are both ok for me. I'd like to do some experiments to study which method can generate more efficient operations.

I don't think populateVectorShapeCastLoweringPatterns will need changing. The only difference in my mind between the 2 approaches is the pattern that goes inside populateVectorFromElementsOpLoweringPatterns. Option 'direct' will have a pattern which goes to inserts/extracts, basically the same logic as in this PR currently. Option 'shape_cast' will have a pattern which goes to shape_cast. Then populateVectorShapeCastLoweringPatterns already has the logic to lower from shape_cast to inserts/extracts IIRC.

I can see pros/cons with both approach. One pro with a pattern to go to shape_cast is slightly less new code (although it's probably only slightly less, as this PR in it's current state isn't large). One con is that the lowering path is longer / less intuitive. At this point I have no strong preference, and think exploring both is a good idea -- thank you!

Groverkss · 2025-08-08T14:25:41Z

The idea is to have a vector-to-vector lowering here, and to keep the actual vector-to-llvm conversion simple and restricted to already 'lowered' vector ops.

So I guess the request is to implement a function populateVectorFromElementsOpLoweringPatterns. Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

I'm not sure about this approach. There are two approaches that can be taken in general, unrolling or flattening. I would much rather us do unrolling for lowering. While flattenting looks nice, it can sometimes generate shuffles (by generating extract_strided_slice/insert_strided_slice).

We also need to be consistent across lowerings, if one lowering does flattenting while other does unrolling, we start relying too much on LLVM to cleanup code that we generated.

Groverkss · 2025-08-08T14:47:08Z

IIUC, we now have two ways to convert vector to LLVM:

convert-vector-to-llvm pass. It includes two stages: a transformation stage that "lowers vector ops to other vector ops", and a conversion that "lowers vector ops to LLVM ops".

convert-to-llvm: only includes the latter conversion stage.

We only need to support lowering multi-dim from_elements ops in convert-vector-to-llvm. Please correct me if I'm wrong.

Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

Our goal is keeping the conversion stage simple, right? So it is ok to shift the complexity to populateVectorShapeCastLoweringPatterns or populateVectorFromElementsOpLoweringPatterns, which both belong to the transformation stage.

Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

Using shapecast or iteratively unrolling N-D vectors are both ok for me. I'd like to do some experiments to study which method can generate more efficient operations.

I think you are looking at it differently than I think about this. The main thing we are trying to do is:

Convert vector dialect operations on N-D vectors to LLVM (only 1-D vectors) , SPIRV (1-D vectors + other restrictions)

For a N-D vector operation, you can write a simple conversion as this patch did, where you take an N-D operation, and directly lower it to llvm using llvm.extractvalue/llvm.insertvalue to model N-D vectors. This has multiple problems:

You are relying on LLVM to cleanup chains of llvm.extractvalue/llvm.insertvalue . If you accidently mix directly lowering vector operations using llvm.extractvalue/llvm.insertvalue and unrolled operations, you are using LLVM as a magic box which will fix it for you.
In long term, we do not want to emit llvm.extractvalue/llvm.insertvalue at all. Have a look at the recent RFC: https://discourse.llvm.org/t/rfc-towards-disallowing-struct-array-ir-values/87154
It's hard to maintain consistency like this. There are two ways to go from N-D vectors to 1-D vectors, unrolling operations or flattening them. If you mix these accidently during conversion, you will never be able to cleanup the boundary between a flattend operation and a unrolled operation.
And probably the most important thing, LLVM is not the only backend that vector dialect lowers to. SPIRV is a supported backend and also needs unrolling.

How we ideally want to structure conversion to backends is:

Set of patterns to do unrolling from N-D vectors to 1-D vectors
Set of patterns to do flattening from N-D vectors to 1-D vectors (in case someone wants to do this, we dont have patterns for this today)
Set of patterns to cleanup boundary ops created by unrolling/flattening
Set of patterns to convert 1-D vector dialect operations to LLVM dialect
Set of patterns to convert 1-D vector (+ spirv restrictions) dialect operations to SPIRV dialect

with these patterns, we can build any of the passes we have above. But we do not want to mix things between these set of patterns. For example, we should never have a N-D vector dialect operation conversion to LLVM dialect, because that breaks the whole cleanup contract and we have no reuse for SPIRV.

dcaballe · 2025-08-08T17:21:36Z

I would much rather us do unrolling for lowering. While flattenting looks nice, it can sometimes generate shuffles (by generating extract_strided_slice/insert_strided_slice).

I don’t think we can make a call on unrolling vs linearization. Unrolling will bloat the code size when unrolling a large dimension whereas linearization will generate fewer ops (best case a single op). Vector shuffles will be generated anyways by LLVM regardless of what we do at MLIR level. The right call is probably project dependent.

my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

The shape cast implementation makes sense to me if we decouple it from the actual lowering to LLVM. It’s basically the vector linearization flavor so that probably should go into VectorLinearize.cpp. I think it’s important that we keep the linearization-like patterns focused on shape cast so that we implement an optimized lowering for linearization patterns in just one place.

How we ideally want to structure conversion to backends is:

Set of patterns to do unrolling from N-D vectors to 1-D vectors

Set of patterns to do flattening from N-D vectors to 1-D vectors (in case someone wants to do this, we dont have patterns for this today)

Set of patterns to cleanup boundary ops created by unrolling/flattening

Set of patterns to convert 1-D vector dialect operations to LLVM dialect

Set of patterns to convert 1-D vector (+ spirv restrictions) dialect operations to SPIRV dialect

This makes sense to me, with a twist. IMO, the main problem is that ConvertVectorToLLVM currently does far more than just converting to LLVM. It makes algorithmic decisions, such as how to lower a contract op or a transpose, which aren't necessarily driven by LLVM constraints. These choices were made out of early implementations available at the time, which can make the pass challenging to use in production nowadays.

Expanding on @Groverkss ' point, I believe we should decouple:

Algorithmic decision: Move algorithmic decision out of the vector to LLVM conversion. These transformations could happen within a single or multiple configurable passes/transformations.
LLVM/SPIR-V legalization constraints: Reframe ConvertVectorToLLVM/SPIRV as one (or two) dedicated legalization pass(es). These passes should strictly focus on applying vector-to-vector transformations based on LLVM/SPIRV constraints and should be configurable to support both unrolling and linearization.
Actual lowering to LLVM: Leverage the LLVM conversion interface to perform the final, straightforward conversion to LLVM.

Deciding on the actual direction here is something we should prioritize as it would requite quite some work and coordination. This would be a great topic for the Tensor Compiler WG.

For this PR specifically, my suggestion is that we:

Add the shape cast implementation to VectorLinearize.cpp (one PR).
Add the unrolling implementation, using the direct implementation, similar to what other unrolling patterns do (one PR).

WDYT?

yangtetris · 2025-08-11T10:08:47Z

How we ideally want to structure conversion to backends is:

Set of patterns to do unrolling from N-D vectors to 1-D vectors
Set of patterns to do flattening from N-D vectors to 1-D vectors (in case someone wants to do this, we dont have patterns for this today)
Set of patterns to cleanup boundary ops created by unrolling/flattening
Set of patterns to convert 1-D vector dialect operations to LLVM dialect
Set of patterns to convert 1-D vector (+ spirv restrictions) dialect operations to SPIRV dialect.

Thank you for your detailed explanation, this is very helpful for understanding why keeping the conversion logic simple is important.

Unrolling will bloat the code size when unrolling a large dimension whereas linearization will generate fewer ops (best case a single op).

There could be some different cases to consider here. Here are the comparison between unrolling vs flattening, with the ordinary vector type <2x2x3x4x7x11xi32>. I was surprised that there's not much difference in the generated IR sequence lengths.

	Unrolling	Flattening
Process	<2x3x4x7x2xi32> -> 2 x vector<2x3x4x7x11xi32> -> 2x2 x vector<3x4x7x11xi32> -> ... -> 2x2x3x4x7 x vector<11xi32>	from_elements: vector<3696xi32> shape_cast: vector<3696xi32> -> vector<2x2x3x4x7x11xi32>
Count by Op Type	llvm.insertelement: 3696 llvm.insertvalue: 402 llvm.mlir.constant: 3696 llvm.mlir.poison: 336	llvm.insertelement: 3696 llvm.insertvalue: 336 llvm.mlir.constant: 3696 llvm.mlir.poison: 336
Total	8130	8064

What's more, after CSE, the unrolling method will have a much shorter length, because there are only 11 unique constant ops for indices.

For this PR specifically, my suggestion is that we:

Add the shape cast implementation to VectorLinearize.cpp (one PR).
Add the unrolling implementation, using the direct implementation, similar to what other unrolling patterns do (one PR).

That makes sense to me. Users should be able to choose between unrolling or flattening (temporarily, we can choose one to integrate into convert-vector-to-llvm). Could you please further elaborate on what "using the direct implementation" refers to? My plan is to implement one following the example of UnrollVectorGather.

Groverkss · 2025-08-11T12:58:41Z

I don’t think we can make a call on unrolling vs linearization. Unrolling will bloat the code size when unrolling a large dimension whereas linearization will generate fewer ops (best case a single op). Vector shuffles will be generated anyways by LLVM regardless of what we do at MLIR level. The right call is probably project dependent.

I agree the right call is project dependent, but I don't agree with "vector shuffles will be generated anyways by LLVM...". Take for example:

a = load : vector<2x8xf32>
b = elementwise a : vector<2x8xf32>
store b : vector<2x8xf32>

Unrolling:

a_0 = load: vector<8xf32>
a_1 = load: vector<8xf32>
b_0 = elementwise a_0 : vector<8xf32>
b_1 = elemenetwise a_1 : vector<8xf32>
store b_0 : vector<8xf32>
store b_1 : vector<8xf32>

Flattening:

// Loads cannot be flattened, they have to be unrolled and then shuffled into a single vector
a_0 = load: vector<8xf32>
a_1 = load: vector<8xf32>
a = shuffle a_0, a_1 : vector<8xf32>, vector<8xf32> -> vector<16xf32>
b = elementwise a : vector<8xf32>
b_0 = shufflevector b, poison : vector<8xf32>, vector<8xf32> -> vector<4xf32>
b_1 = shufflevector b, poison : vector<8xf32>, vector<8xf32> -> vector<4xf32>
// Stores cannot be flattened, they have to be sliced using shuffles
store b_0 : vector<8xf32>
store b_1: vector<8xf32>
``

In the unrolling IR, we have no shuffles, while in the flattening IR we do have shuffles.

This is what I meant by my comment, that locally, flattening looks nice, because you get a single operation on a wide vector, but it has implications across the entire IR which need to be considered properly.

dcaballe · 2025-08-11T17:54:31Z

What's more, after CSE, the unrolling method will have a much shorter length, because there are only 11 unique constant ops for indices.

Unrolling would generate 11-element vectors, which is a number the backend would have to legalize with padding or other techniques. We would have to look into the specifics of the example to understand what is happening but, in any case, my point was that we need to support both approaches. This shouldn't about using one vs the other.

I don't agree with "vector shuffles will be generated anyways by LLVM..."

The example only shows elementwise operations. For cases requiring actual shape/layout transformations, shuffles (or similar ops) would be needed, right? That was the point I was trying to make.

Could you please further elaborate on what "using the direct implementation" refers to? My plan is to implement one following the example of UnrollVectorGather.

For the vector linearization approach, we would need to add a pattern to VectorLinearize.cpp that is turning the n-D vector.from_elements op into a 1-D vector.from_elements op + vector.shape_cast.

For the vector unrolling approach, doing something like the vector gather example makes sense. The direct implementation refers to implementing the unrolling manually, using vector.insert/vector.extract operations.

[mlir] Support lowering multi-dim vectors in VectorFromElementsLowering

cf8174f

yangtetris requested review from banach-space, dcaballe and nicolasvasilache as code owners July 29, 2025 15:48

llvmbot added the mlir label Jul 29, 2025

fix code format

8fe4930

dcaballe approved these changes Jul 30, 2025

View reviewed changes

nicolasvasilache reviewed Jul 30, 2025

View reviewed changes

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp Outdated Show resolved Hide resolved

yangtetris and others added 2 commits July 31, 2025 10:23

Fix typo

4421142

Co-authored-by: Nicolas Vasilache <[email protected]>

refine

eec412b

llvmbot added the mlir:llvm label Jul 31, 2025

banach-space approved these changes Aug 5, 2025

View reviewed changes

newling added mlir:vector mlir:vectorops labels Aug 6, 2025

Groverkss requested changes Aug 6, 2025

View reviewed changes

[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering #151175

Are you sure you want to change the base?

[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering #151175

Conversation

yangtetris commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dcaballe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dcaballe Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

yangtetris Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

banach-space left a comment

Choose a reason for hiding this comment

Uh oh!

newling commented Aug 6, 2025

Uh oh!

Groverkss left a comment

Choose a reason for hiding this comment

Uh oh!

Groverkss commented Aug 6, 2025

Uh oh!

dcaballe commented Aug 6, 2025

Uh oh!

yangtetris commented Aug 7, 2025

Uh oh!

dcaballe commented Aug 7, 2025

Uh oh!

newling commented Aug 7, 2025

Uh oh!

banach-space commented Aug 7, 2025

Uh oh!

yangtetris commented Aug 8, 2025

Uh oh!

newling commented Aug 8, 2025

Uh oh!

Groverkss commented Aug 8, 2025

Uh oh!

Groverkss commented Aug 8, 2025

Uh oh!

dcaballe commented Aug 8, 2025

Uh oh!

yangtetris commented Aug 11, 2025

Uh oh!

Groverkss commented Aug 11, 2025

Uh oh!

dcaballe commented Aug 11, 2025

Uh oh!

Uh oh!

yangtetris commented Jul 29, 2025 •

edited

Loading

llvmbot commented Jul 29, 2025 •

edited

Loading

github-actions bot commented Jul 29, 2025 •

edited

Loading