Skip to content

[mlir][vector] Support multi-dimensional vectors in VectorFromElementsLowering #151175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

yangtetris
Copy link
Contributor

@yangtetris yangtetris commented Jul 29, 2025

This patch extends the VectorFromElementsLowering conversion pattern to support
vectors of any rank, removing the previous restriction to 0D/1D vectors only.

Implementation Details:

  1. 0D vectors: Handled explicitly since LLVMTypeConverter converts them to
    length-1 1D vectors
  2. 1D vectors: Direct construction using llvm.insertelement operations
  3. N-D vectors: Two-phase construction:
    • Build 1D vectors for the innermost dimension using llvm.insertelement
    • Assemble them into the nested aggregate structure using llvm.insertvalue
      and nDVectorIterate
  4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency

Example:

// Before: Failed for rank > 1
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector<2x2xf32>

// After: Converts to nested aggregate
%poison = llvm.mlir.poison : !llvm.array<2 x vector<2xf32>>
%inner0 = llvm.insertelement %e0, %poison_1d[%c0] : vector<2xf32>
%inner0 = llvm.insertelement %e1, %inner0[%c1] : vector<2xf32>
%inner1 = llvm.insertelement %e2, %poison_1d[%c0] : vector<2xf32>
%inner1 = llvm.insertelement %e3, %inner1[%c1] : vector<2xf32>
%result = llvm.insertvalue %inner0, %poison[0] : !llvm.array<2 x vector<2xf32>>
%result = llvm.insertvalue %inner1, %result[1] : !llvm.array<2 x vector<2xf32>>

@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2025

@llvm/pr-subscribers-mlir-vector
@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Yang Bai (yangtetris)

Changes

This patch extends the VectorFromElementsLowering conversion pattern to support
vectors of any rank, removing the previous restriction to 1D vectors only.

Implementation Details:

  1. 0D vectors: Handled explicitly since LLVMTypeConverter converts them to
    length-1 1D vectors
  2. 1D vectors: Direct construction using llvm.insertelement operations
  3. N-D vectors: Two-phase construction:
    • Build 1D vectors for the innermost dimension using llvm.insertelement
    • Assemble them into the nested aggregate structure using llvm.insertvalue
      and nDVectorIterate
  4. Use direct LLVM dialect operations instead of intermediate vector.insert operations for efficiency

Example:

// Before: Failed for rank &gt; 1
%v = vector.from_elements %e0, %e1, %e2, %e3 : vector&lt;2x2xf32&gt;

// After: Converts to nested aggregate
%poison = llvm.mlir.poison : !llvm.array&lt;2 x vector&lt;2xf32&gt;&gt;
%inner0 = llvm.insertelement %e0, %poison_1d[%c0] : vector&lt;2xf32&gt;
%inner0 = llvm.insertelement %e1, %inner0[%c1] : vector&lt;2xf32&gt;
%inner1 = llvm.insertelement %e2, %poison_1d[%c0] : vector&lt;2xf32&gt;
%inner1 = llvm.insertelement %e3, %inner1[%c1] : vector&lt;2xf32&gt;
%result = llvm.insertvalue %inner0, %poison[0] : !llvm.array&lt;2 x vector&lt;2xf32&gt;&gt;
%result = llvm.insertvalue %inner1, %result[1] : !llvm.array&lt;2 x vector&lt;2xf32&gt;&gt;

Full diff: https://github.com/llvm/llvm-project/pull/151175.diff

2 Files Affected:

  • (modified) mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp (+55-8)
  • (modified) mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir (+24)
diff --git a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
index 17a79e3815b97..26d056cadb19c 100644
--- a/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
+++ b/mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVM.cpp
@@ -1890,15 +1890,62 @@ struct VectorFromElementsLowering
                   ConversionPatternRewriter &rewriter) const override {
     Location loc = fromElementsOp.getLoc();
     VectorType vectorType = fromElementsOp.getType();
-    // TODO: Multi-dimensional vectors lower to !llvm.array<... x vector<>>.
-    // Such ops should be handled in the same way as vector.insert.
-    if (vectorType.getRank() > 1)
-      return rewriter.notifyMatchFailure(fromElementsOp,
-                                         "rank > 1 vectors are not supported");
     Type llvmType = typeConverter->convertType(vectorType);
-    Value result = LLVM::PoisonOp::create(rewriter, loc, llvmType);
-    for (auto [idx, val] : llvm::enumerate(adaptor.getElements()))
-      result = vector::InsertOp::create(rewriter, loc, val, result, idx);
+    Type llvmIndexType = typeConverter->convertType(rewriter.getIndexType());
+
+    Value result;
+    // 0D vectors are converted to legnth-1 1D vectors by LLVMTypeConverter.
+    if (vectorType.getRank() == 0) {
+      result = LLVM::PoisonOp::create(rewriter, loc, llvmType);
+      auto index0 = LLVM::ConstantOp::create(rewriter, loc, llvmIndexType, 0);
+      result = LLVM::InsertElementOp::create(rewriter, loc, result, adaptor.getElements().front(), index0);
+      rewriter.replaceOp(fromElementsOp, result);
+      return success();
+    }
+    
+    // Build 1D vectors for the innermost dimension
+    int64_t innerDimSize = vectorType.getShape().back();
+    int64_t numInnerVectors = vectorType.getNumElements() / innerDimSize;
+
+    SmallVector<Value> innerVectors;
+    innerVectors.reserve(numInnerVectors);
+
+    auto innerVectorType = VectorType::get(innerDimSize, vectorType.getElementType());
+    Type llvmInnerType = typeConverter->convertType(innerVectorType);
+
+    int64_t elementInVectorIdx = 0;
+    Value innerVector;
+    for (auto val : adaptor.getElements()) {
+      if (elementInVectorIdx == 0)
+        innerVector = LLVM::PoisonOp::create(rewriter, loc, llvmInnerType);
+      auto position = LLVM::ConstantOp::create(rewriter, loc, llvmIndexType, elementInVectorIdx);
+      innerVector = LLVM::InsertElementOp::create(rewriter, loc, llvmInnerType, innerVector, val, position);
+      if (++elementInVectorIdx == innerDimSize) {
+        innerVectors.push_back(innerVector);
+        elementInVectorIdx = 0;
+      }
+    }
+
+    // For 1D vectors, we can just return the first innermost vector.
+    if (vectorType.getRank() == 1) {
+      rewriter.replaceOp(fromElementsOp, innerVectors.front());
+      return success();
+    }
+
+    // Now build the nested aggregate structure from these 1D vectors.
+    result = LLVM::PoisonOp::create(rewriter, loc, llvmType);
+    
+    // Use the same iteration approach as VectorBroadcastScalarToNdLowering to
+    // insert the 1D vectors into the aggregate.
+    auto vectorTypeInfo = LLVM::detail::extractNDVectorTypeInfo(vectorType, *getTypeConverter());
+    if (!vectorTypeInfo.llvmNDVectorTy)
+      return failure();
+    int64_t vectorIdx = 0;
+    nDVectorIterate(vectorTypeInfo, rewriter, [&](ArrayRef<int64_t> position) {
+      result = LLVM::InsertValueOp::create(rewriter, loc, result, 
+                                           innerVectors[vectorIdx++], position);
+    });
+    
     rewriter.replaceOp(fromElementsOp, result);
     return success();
   }
diff --git a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
index 31e17fb3e3cc6..834858c0b7c8f 100644
--- a/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
+++ b/mlir/test/Conversion/VectorToLLVM/vector-to-llvm-interface.mlir
@@ -2286,6 +2286,30 @@ func.func @from_elements_0d(%arg0: f32) -> vector<f32> {
 
 // -----
 
+// CHECK-LABEL: func.func @from_elements_3d(
+//  CHECK-SAME:     %[[ARG_0:.*]]: f32, %[[ARG_1:.*]]: f32, %[[ARG_2:.*]]: f32, %[[ARG_3:.*]]: f32)
+//       CHECK:   %[[UNDEF_VEC0:.*]] = llvm.mlir.poison : vector<2xf32>
+//       CHECK:   %[[C0_0:.*]] = llvm.mlir.constant(0 : i64) : i64
+//       CHECK:   %[[VEC0_0:.*]] = llvm.insertelement %[[ARG_0]], %[[UNDEF_VEC0]][%[[C0_0]] : i64] : vector<2xf32>
+//       CHECK:   %[[C1_0:.*]] = llvm.mlir.constant(1 : i64) : i64
+//       CHECK:   %[[VEC0_1:.*]] = llvm.insertelement %[[ARG_1]], %[[VEC0_0]][%[[C1_0]] : i64] : vector<2xf32>
+//       CHECK:   %[[UNDEF_VEC1:.*]] = llvm.mlir.poison : vector<2xf32>
+//       CHECK:   %[[C0_1:.*]] = llvm.mlir.constant(0 : i64) : i64
+//       CHECK:   %[[VEC1_0:.*]] = llvm.insertelement %[[ARG_2]], %[[UNDEF_VEC1]][%[[C0_1]] : i64] : vector<2xf32>
+//       CHECK:   %[[C1_1:.*]] = llvm.mlir.constant(1 : i64) : i64
+//       CHECK:   %[[VEC1_1:.*]] = llvm.insertelement %[[ARG_3]], %[[VEC1_0]][%[[C1_1]] : i64] : vector<2xf32>
+//       CHECK:   %[[UNDEF_RES:.*]] = llvm.mlir.poison : !llvm.array<2 x array<1 x vector<2xf32>>>
+//       CHECK:   %[[RES_0:.*]] = llvm.insertvalue %[[VEC0_1]], %[[UNDEF_RES]][0, 0] : !llvm.array<2 x array<1 x vector<2xf32>>>
+//       CHECK:   %[[RES_1:.*]] = llvm.insertvalue %[[VEC1_1]], %[[RES_0]][1, 0] : !llvm.array<2 x array<1 x vector<2xf32>>>
+//       CHECK:   %[[CAST:.*]] = builtin.unrealized_conversion_cast %[[RES_1]] : !llvm.array<2 x array<1 x vector<2xf32>>> to vector<2x1x2xf32>
+//       CHECK:   return %[[CAST]]
+func.func @from_elements_3d(%arg0: f32, %arg1: f32, %arg2: f32, %arg3: f32) -> vector<2x1x2xf32> {
+  %0 = vector.from_elements %arg0, %arg1, %arg2, %arg3 : vector<2x1x2xf32>
+  return %0 : vector<2x1x2xf32>
+}
+
+// -----
+
 //===----------------------------------------------------------------------===//
 // vector.to_elements
 //===----------------------------------------------------------------------===//

Copy link

github-actions bot commented Jul 29, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just minor comments. Feel free to address them before landing. Thanks!

Comment on lines 1942 to 1951
// Use the same iteration approach as VectorBroadcastScalarToNdLowering to
// insert the 1D vectors into the aggregate.
auto vectorTypeInfo =
LLVM::detail::extractNDVectorTypeInfo(vectorType, *getTypeConverter());
if (!vectorTypeInfo.llvmNDVectorTy)
return failure();
int64_t vectorIdx = 0;
nDVectorIterate(vectorTypeInfo, rewriter, [&](ArrayRef<int64_t> position) {
result = LLVM::InsertValueOp::create(rewriter, loc, result,
innerVectors[vectorIdx++], position);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a change to refactor this code for both cases? This sounds like a common pattern that other ops might need as well...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Other vector ops might also use this pattern. I just added a new overload to nDVectorIterate which accepts a VectorType and internally calls extractNDVectorTypeInfo. But I didn't change the usage in VectorBroadcastScalarToNdLowering, because it needs to do some things first that depend on extractNDVectorTypeInfo before it can execute nDVectorIterate.

yangtetris and others added 2 commits July 31, 2025 10:23
Co-authored-by: Nicolas Vasilache <[email protected]>
Copy link
Contributor

@banach-space banach-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

I really appreciate your clear PR summaries, thank you!

[nit] %poison = llvm.mlir.poison : !llvm.array<2 x vector<2xf32>> -> %poison_1d = llvm.mlir.poison : !llvm.array<2 x vector<2xf32>>`?

@newling
Copy link
Contributor

newling commented Aug 6, 2025

What about having a pattern in the lowering that insert a shape_cast (N-D -> 1-D) before, will the resulting IR be worse? If so does that mean shape_cast lowering needs improvement?

Copy link
Member

@Groverkss Groverkss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is how it should be done for N-D vectors. This should be done the same way as other vector ops do, by seperating the unrolling transformation from the conversion.

Most LLVM conversions only support 1-D/0-D vectors. And there are seperate transformations which unroll.

@Groverkss
Copy link
Member

This should be implemented the same way #132227 is implemented. We should never do unrolling like this in conversion. It should always be a seperate pattern.

@dcaballe
Copy link
Contributor

dcaballe commented Aug 6, 2025

What about having a pattern in the lowering that insert a shape_cast (N-D -> 1-D) before, will the resulting IR be worse? If so does that mean shape_cast lowering needs improvement?

IIRC, there is no lowering for shape cast ops in this set of patterns so that would create a dependency with the independent lowering of shape cast ops. This definitely needs some work as I'm not sure keeping them separate makes sense anymore. Something we should address separately.

This should be implemented the same way #132227 is implemented.

This makes sense to me. We may want to revisit this for ops that are already do it. My understanding is that this is mostly inspired by the existing conversion of vector.insert.

@yangtetris
Copy link
Contributor Author

This should be implemented the same way #132227 is implemented. We should never do unrolling like this in conversion. It should always be a seperate pattern.

Thank you for your feedback. Decoupling the unrolling transformation and conversion seems like a good idea. However, I'm wondering if the conversion pattern only supports 1-D vectors, then how would we support converting multi-dim vectors in VectorToLLVMDialectInterface? There is no transformation stage in the convert-to-llvm pass. Or, is it okay to not support from_elements with multi-dim vectors in the convert-to-llvm?

@dcaballe
Copy link
Contributor

dcaballe commented Aug 7, 2025

Or, is it okay to not support from_elements with multi-dim vectors in the convert-to-llvm?

Yes, it makes sense that some independent prep work is required before the actual conversion to keep the conversion simpler so just bailing out for multi-dim vectors should be ok.

@newling
Copy link
Contributor

newling commented Aug 7, 2025

The idea is to have a vector-to-vector lowering here, and to keep the actual vector-to-llvm conversion simple and restricted to already 'lowered' vector ops.

So I guess the request is to implement a function populateVectorFromElementsOpLoweringPatterns. Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

@banach-space
Copy link
Contributor

The idea is to have a vector-to-vector lowering here, and to keep the actual vector-to-llvm conversion simple and restricted to already 'lowered' vector ops.

Keeping vector-to-llvm simple makes a lot of sense. Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

@yangtetris
Copy link
Contributor Author

IIUC, we now have two ways to convert vector to LLVM:

  1. convert-vector-to-llvm pass. It includes two stages: a transformation stage that "lowers vector ops to other vector ops", and a conversion that "lowers vector ops to LLVM ops".
  2. convert-to-llvm: only includes the latter conversion stage.

We only need to support lowering multi-dim from_elements ops in convert-vector-to-llvm. Please correct me if I'm wrong.

Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

Our goal is keeping the conversion stage simple, right? So it is ok to shift the complexity to populateVectorShapeCastLoweringPatterns or populateVectorFromElementsOpLoweringPatterns, which both belong to the transformation stage.

Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

Using shapecast or iteratively unrolling N-D vectors are both ok for me. I'd like to do some experiments to study which method can generate more efficient operations.

@newling
Copy link
Contributor

newling commented Aug 8, 2025

IIUC, we now have two ways to convert vector to LLVM:

  1. convert-vector-to-llvm pass. It includes two stages: a transformation stage that "lowers vector ops to other vector ops", and a conversion that "lowers vector ops to LLVM ops".
  2. convert-to-llvm: only includes the latter conversion stage.

We only need to support lowering multi-dim from_elements ops in convert-vector-to-llvm. Please correct me if I'm wrong.

Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

Our goal is keeping the conversion stage simple, right? So it is ok to shift the complexity to populateVectorShapeCastLoweringPatterns or populateVectorFromElementsOpLoweringPatterns, which both belong to the transformation stage.

Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

Using shapecast or iteratively unrolling N-D vectors are both ok for me. I'd like to do some experiments to study which method can generate more efficient operations.

I don't think populateVectorShapeCastLoweringPatterns will need changing. The only difference in my mind between the 2 approaches is the pattern that goes inside populateVectorFromElementsOpLoweringPatterns. Option 'direct' will have a pattern which goes to inserts/extracts, basically the same logic as in this PR currently. Option 'shape_cast' will have a pattern which goes to shape_cast. Then populateVectorShapeCastLoweringPatterns already has the logic to lower from shape_cast to inserts/extracts IIRC.

I can see pros/cons with both approach. One pro with a pattern to go to shape_cast is slightly less new code (although it's probably only slightly less, as this PR in it's current state isn't large). One con is that the lowering path is longer / less intuitive. At this point I have no strong preference, and think exploring both is a good idea -- thank you!

@Groverkss
Copy link
Member

The idea is to have a vector-to-vector lowering here, and to keep the actual vector-to-llvm conversion simple and restricted to already 'lowered' vector ops.

So I guess the request is to implement a function populateVectorFromElementsOpLoweringPatterns. Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

I'm not sure about this approach. There are two approaches that can be taken in general, unrolling or flattening. I would much rather us do unrolling for lowering. While flattenting looks nice, it can sometimes generate shuffles (by generating extract_strided_slice/insert_strided_slice).

We also need to be consistent across lowerings, if one lowering does flattenting while other does unrolling, we start relying too much on LLVM to cleanup code that we generated.

@Groverkss
Copy link
Member

IIUC, we now have two ways to convert vector to LLVM:

  1. convert-vector-to-llvm pass. It includes two stages: a transformation stage that "lowers vector ops to other vector ops", and a conversion that "lowers vector ops to LLVM ops".
  2. convert-to-llvm: only includes the latter conversion stage.

We only need to support lowering multi-dim from_elements ops in convert-vector-to-llvm. Please correct me if I'm wrong.

Expanding populateVectorShapeCastLoweringPatterns (or creating patterns that inject vector.shape_cast and then leveraging populateVectorShapeCastLoweringPatterns) less so - wouldn't we be simply shifting the complexity within vector-to-llvm (as opposed to making it simpler)?

Our goal is keeping the conversion stage simple, right? So it is ok to shift the complexity to populateVectorShapeCastLoweringPatterns or populateVectorFromElementsOpLoweringPatterns, which both belong to the transformation stage.

Or my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

Using shapecast or iteratively unrolling N-D vectors are both ok for me. I'd like to do some experiments to study which method can generate more efficient operations.

I think you are looking at it differently than I think about this. The main thing we are trying to do is:

Convert vector dialect operations on N-D vectors to LLVM (only 1-D vectors) , SPIRV (1-D vectors + other restrictions)

For a N-D vector operation, you can write a simple conversion as this patch did, where you take an N-D operation, and directly lower it to llvm using llvm.extractvalue/llvm.insertvalue to model N-D vectors. This has multiple problems:

  1. You are relying on LLVM to cleanup chains of llvm.extractvalue/llvm.insertvalue . If you accidently mix directly lowering vector operations using llvm.extractvalue/llvm.insertvalue and unrolled operations, you are using LLVM as a magic box which will fix it for you.
  2. In long term, we do not want to emit llvm.extractvalue/llvm.insertvalue at all. Have a look at the recent RFC: https://discourse.llvm.org/t/rfc-towards-disallowing-struct-array-ir-values/87154
  3. It's hard to maintain consistency like this. There are two ways to go from N-D vectors to 1-D vectors, unrolling operations or flattening them. If you mix these accidently during conversion, you will never be able to cleanup the boundary between a flattend operation and a unrolled operation.
  4. And probably the most important thing, LLVM is not the only backend that vector dialect lowers to. SPIRV is a supported backend and also needs unrolling.

How we ideally want to structure conversion to backends is:

  • Set of patterns to do unrolling from N-D vectors to 1-D vectors
  • Set of patterns to do flattening from N-D vectors to 1-D vectors (in case someone wants to do this, we dont have patterns for this today)
  • Set of patterns to cleanup boundary ops created by unrolling/flattening
  • Set of patterns to convert 1-D vector dialect operations to LLVM dialect
  • Set of patterns to convert 1-D vector (+ spirv restrictions) dialect operations to SPIRV dialect

with these patterns, we can build any of the passes we have above. But we do not want to mix things between these set of patterns. For example, we should never have a N-D vector dialect operation conversion to LLVM dialect, because that breaks the whole cleanup contract and we have no reuse for SPIRV.

@dcaballe
Copy link
Contributor

dcaballe commented Aug 8, 2025

I would much rather us do unrolling for lowering. While flattenting looks nice, it can sometimes generate shuffles (by generating extract_strided_slice/insert_strided_slice).

I don’t think we can make a call on unrolling vs linearization. Unrolling will bloat the code size when unrolling a large dimension whereas linearization will generate fewer ops (best case a single op). Vector shuffles will be generated anyways by LLVM regardless of what we do at MLIR level. The right call is probably project dependent.

my suggestion was to add a pattern there to split a rank-D from_elements op into a rank-1 from_elements op preceded by a shape_cast, and then populateVectorShapeCastLoweringPatterns will kick in and generate the inserts/extracts.

The shape cast implementation makes sense to me if we decouple it from the actual lowering to LLVM. It’s basically the vector linearization flavor so that probably should go into VectorLinearize.cpp. I think it’s important that we keep the linearization-like patterns focused on shape cast so that we implement an optimized lowering for linearization patterns in just one place.

How we ideally want to structure conversion to backends is:

  • Set of patterns to do unrolling from N-D vectors to 1-D vectors
  • Set of patterns to do flattening from N-D vectors to 1-D vectors (in case someone wants to do this, we dont have patterns for this today)
  • Set of patterns to cleanup boundary ops created by unrolling/flattening
  • Set of patterns to convert 1-D vector dialect operations to LLVM dialect
  • Set of patterns to convert 1-D vector (+ spirv restrictions) dialect operations to SPIRV dialect

This makes sense to me, with a twist. IMO, the main problem is that ConvertVectorToLLVM currently does far more than just converting to LLVM. It makes algorithmic decisions, such as how to lower a contract op or a transpose, which aren't necessarily driven by LLVM constraints. These choices were made out of early implementations available at the time, which can make the pass challenging to use in production nowadays.

Expanding on @Groverkss ' point, I believe we should decouple:

  • Algorithmic decision: Move algorithmic decision out of the vector to LLVM conversion. These transformations could happen within a single or multiple configurable passes/transformations.
  • LLVM/SPIR-V legalization constraints: Reframe ConvertVectorToLLVM/SPIRV as one (or two) dedicated legalization pass(es). These passes should strictly focus on applying vector-to-vector transformations based on LLVM/SPIRV constraints and should be configurable to support both unrolling and linearization.
  • Actual lowering to LLVM: Leverage the LLVM conversion interface to perform the final, straightforward conversion to LLVM.

Deciding on the actual direction here is something we should prioritize as it would requite quite some work and coordination. This would be a great topic for the Tensor Compiler WG.

For this PR specifically, my suggestion is that we:

  • Add the shape cast implementation to VectorLinearize.cpp (one PR).
  • Add the unrolling implementation, using the direct implementation, similar to what other unrolling patterns do (one PR).

WDYT?

@yangtetris
Copy link
Contributor Author

How we ideally want to structure conversion to backends is:

Set of patterns to do unrolling from N-D vectors to 1-D vectors
Set of patterns to do flattening from N-D vectors to 1-D vectors (in case someone wants to do this, we dont have patterns for this today)
Set of patterns to cleanup boundary ops created by unrolling/flattening
Set of patterns to convert 1-D vector dialect operations to LLVM dialect
Set of patterns to convert 1-D vector (+ spirv restrictions) dialect operations to SPIRV dialect.

Thank you for your detailed explanation, this is very helpful for understanding why keeping the conversion logic simple is important.

Unrolling will bloat the code size when unrolling a large dimension whereas linearization will generate fewer ops (best case a single op).

There could be some different cases to consider here. Here are the comparison between unrolling vs flattening, with the ordinary vector type <2x2x3x4x7x11xi32>. I was surprised that there's not much difference in the generated IR sequence lengths.

Unrolling Flattening
Process <2x3x4x7x2xi32> -> 2 x vector<2x3x4x7x11xi32> -> 2x2 x vector<3x4x7x11xi32> -> ... -> 2x2x3x4x7 x vector<11xi32> from_elements: vector<3696xi32>
shape_cast: vector<3696xi32> -> vector<2x2x3x4x7x11xi32>
Count by Op Type llvm.insertelement: 3696
llvm.insertvalue: 402
llvm.mlir.constant: 3696
llvm.mlir.poison: 336
llvm.insertelement: 3696
llvm.insertvalue: 336
llvm.mlir.constant: 3696
llvm.mlir.poison: 336
Total 8130 8064

What's more, after CSE, the unrolling method will have a much shorter length, because there are only 11 unique constant ops for indices.

For this PR specifically, my suggestion is that we:

Add the shape cast implementation to VectorLinearize.cpp (one PR).
Add the unrolling implementation, using the direct implementation, similar to what other unrolling patterns do (one PR).

That makes sense to me. Users should be able to choose between unrolling or flattening (temporarily, we can choose one to integrate into convert-vector-to-llvm). Could you please further elaborate on what "using the direct implementation" refers to? My plan is to implement one following the example of UnrollVectorGather.

@Groverkss
Copy link
Member

I don’t think we can make a call on unrolling vs linearization. Unrolling will bloat the code size when unrolling a large dimension whereas linearization will generate fewer ops (best case a single op). Vector shuffles will be generated anyways by LLVM regardless of what we do at MLIR level. The right call is probably project dependent.

I agree the right call is project dependent, but I don't agree with "vector shuffles will be generated anyways by LLVM...". Take for example:

a = load : vector<2x8xf32>
b = elementwise a : vector<2x8xf32>
store b : vector<2x8xf32>

Unrolling:

a_0 = load: vector<8xf32>
a_1 = load: vector<8xf32>
b_0 = elementwise a_0 : vector<8xf32>
b_1 = elemenetwise a_1 : vector<8xf32>
store b_0 : vector<8xf32>
store b_1 : vector<8xf32>

Flattening:

// Loads cannot be flattened, they have to be unrolled and then shuffled into a single vector
a_0 = load: vector<8xf32>
a_1 = load: vector<8xf32>
a = shuffle a_0, a_1 : vector<8xf32>, vector<8xf32> -> vector<16xf32>
b = elementwise a : vector<8xf32>
b_0 = shufflevector b, poison : vector<8xf32>, vector<8xf32> -> vector<4xf32>
b_1 = shufflevector b, poison : vector<8xf32>, vector<8xf32> -> vector<4xf32>
// Stores cannot be flattened, they have to be sliced using shuffles
store b_0 : vector<8xf32>
store b_1: vector<8xf32>
``

In the unrolling IR, we have no shuffles, while in the flattening IR we do have shuffles.

This is what I meant by my comment, that locally, flattening looks nice, because you get a single operation on a wide vector, but it has implications across the entire IR which need to be considered properly.

@dcaballe
Copy link
Contributor

What's more, after CSE, the unrolling method will have a much shorter length, because there are only 11 unique constant ops for indices.

Unrolling would generate 11-element vectors, which is a number the backend would have to legalize with padding or other techniques. We would have to look into the specifics of the example to understand what is happening but, in any case, my point was that we need to support both approaches. This shouldn't about using one vs the other.

I don't agree with "vector shuffles will be generated anyways by LLVM..."

The example only shows elementwise operations. For cases requiring actual shape/layout transformations, shuffles (or similar ops) would be needed, right? That was the point I was trying to make.

Could you please further elaborate on what "using the direct implementation" refers to? My plan is to implement one following the example of UnrollVectorGather.

For the vector linearization approach, we would need to add a pattern to VectorLinearize.cpp that is turning the n-D vector.from_elements op into a 1-D vector.from_elements op + vector.shape_cast.

For the vector unrolling approach, doing something like the vector gather example makes sense. The direct implementation refers to implementing the unrolling manually, using vector.insert/vector.extract operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants