-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for vectorized null suppression for block serde #26919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideIntroduce dynamic SIMD feature detection and vectorized null suppression in block serialization using Java Vector API, refactor block encodings to leverage the new API, adjust benchmarks/tests for SIMD support, and verify correctness with new unit tests. Class diagram for new and updated SIMD support classesclassDiagram
class SimdSupport {
<<interface>>
+boolean supportByteGeneric()
+boolean supportShortGeneric()
+boolean supportIntegerGeneric()
+boolean supportLongGeneric()
+boolean supportByteCompress()
+boolean supportShortCompress()
+boolean supportIntegerCompress()
+boolean supportLongCompress()
+static SimdSupport NONE
}
class IntelSimdSupport {
+IntelSimdSupport(OSType)
+boolean supportByteGeneric()
+boolean supportShortGeneric()
+boolean supportIntegerGeneric()
+boolean supportLongGeneric()
+boolean supportByteCompress()
+boolean supportShortCompress()
+boolean supportIntegerCompress()
+boolean supportLongCompress()
}
class AmdSimdSupport {
+AmdSimdSupport(OSType)
+boolean supportByteGeneric()
+boolean supportShortGeneric()
+boolean supportIntegerGeneric()
+boolean supportLongGeneric()
+boolean supportByteCompress()
+boolean supportShortCompress()
+boolean supportIntegerCompress()
+boolean supportLongCompress()
}
class GravitonSimdSupport {
+GravitonSimdSupport(OSType)
}
SimdSupport <|.. IntelSimdSupport
SimdSupport <|.. AmdSimdSupport
SimdSupport <|.. GravitonSimdSupport
class SimdSupportManager {
+static void initialize()
+static SimdSupport get()
+static boolean isInitialized()
}
class SimdUtils {
+static boolean isLinuxGraviton()
+static Optional<String> linuxCpuVendorId()
+static Set<String> readCpuFlags(OSType)
+static String normalizeFlag(String)
}
class SimdInitializer {
+SimdInitializer()
+SimdSupport simdSupport()
}
SimdSupportManager --> SimdSupport
SimdInitializer --> SimdSupportManager
IntelSimdSupport --> OSType
AmdSimdSupport --> OSType
GravitonSimdSupport --> OSType
SimdSupportManager --> TargetArch
SimdSupportManager --> OSType
SimdUtils --> OSType
Class diagram for updated EncoderUtil and block encoding classesclassDiagram
class EncoderUtil {
+static void setSimdSupport(SimdSupport)
+static void compressBytesWithNulls(SliceOutput, byte[], boolean[], int, int)
+static void compressShortsWithNulls(SliceOutput, short[], boolean[], int, int)
+static void compressIntsWithNulls(SliceOutput, int[], boolean[], int, int)
+static void compressLongsWithNulls(SliceOutput, long[], boolean[], int, int)
-static void compressBytesWithNullsVectorized(...)
-static void compressBytesWithNullsScalar(...)
-static void compressShortsWithNullsVectorized(...)
-static void compressShortsWithNullsScalar(...)
-static void compressIntsWithNullsVectorized(...)
-static void compressIntsWithNullsScalar(...)
-static void compressLongsWithNullsVectorized(...)
-static void compressLongsWithNullsScalar(...)
+static SimdSupport simd
}
class ByteArrayBlockEncoding {
+void writeBlock(...)
}
class ShortArrayBlockEncoding {
+void writeBlock(...)
}
class IntArrayBlockEncoding {
+void writeBlock(...)
}
class LongArrayBlockEncoding {
+void writeBlock(...)
}
EncoderUtil <.. ByteArrayBlockEncoding : uses
EncoderUtil <.. ShortArrayBlockEncoding : uses
EncoderUtil <.. IntArrayBlockEncoding : uses
EncoderUtil <.. LongArrayBlockEncoding : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
3cf2de1
to
5da70b5
Compare
* treat as Graviton (covers Graviton2/3 where the model may be Neoverse N1/V1/V2). | ||
*/ | ||
public static boolean isLinuxGraviton() | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a custom detection logic we should use first what https://github.com/oshi/oshi supports and only then fallback to custom parsing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to add SIMD detection logic in SPI, I am less sure if we want to use some third party library since SPI should have very mimimal dependency, and the detection logic is rather simple that we are able to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First of all, I don't think it should live in the SPI in the first place. These kind of optimizations in 99% use cases will live in the trino-main module which depends already on oshi. Having your own implementation, rather than relying on the external one - has its cost that we don't want to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason why this is needed in the SPI is the fact that BlockEncoding implementation live there but tbh concrete implementations shouldn't be a part of the trino-spi. BlockEncoding's can be a part of the plugin but built-in ones should be just moved to the trino-main.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we indeed need to think about the place where such logic should be placed, we need to guarantee it is accessible from lib(the case for vectorizedDecoding for parquet reader), SPI(tehe code for BlockEncoding, though it is detabale since we may move that to trino-main), trino-main, plugin etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `core/trino-spi/src/main/java/io/trino/spi/block/EncoderUtil.java:45` </location>
<code_context>
+
private EncoderUtil() {}
+ public static void setSimdSupport(SimdSupport simdSupport)
+ {
+ simd = requireNonNull(simdSupport, "simdSupport is null");
</code_context>
<issue_to_address>
**issue (bug_risk):** setSimdSupport is public and mutable, which may allow unexpected reconfiguration at runtime.
Consider restricting the visibility of setSimdSupport or ensuring it cannot be called multiple times to prevent inconsistent state.
</issue_to_address>
### Comment 2
<location> `core/trino-spi/src/test/java/io/trino/spi/block/TestEncoderUtil.java:100-107` </location>
<code_context>
+ }
+ }
+
+ public static boolean[][] getIsNullArray(int length)
+ {
+ return new boolean[][] {
</code_context>
<issue_to_address>
**suggestion (testing):** Suggestion: Add more edge cases for isNull patterns.
Include cases with a single null at the start or end, and consecutive nulls in the middle, to improve test coverage.
```suggestion
public static boolean[][] getIsNullArray(int length)
{
return new boolean[][] {
all(false, length),
all(true, length),
alternating(length),
randomBools(length),
singleNullAtStart(length),
singleNullAtEnd(length),
consecutiveNullsInMiddle(length)};
}
private static boolean[] singleNullAtStart(int length)
{
boolean[] arr = new boolean[length];
if (length > 0) {
arr[0] = true;
}
return arr;
}
private static boolean[] singleNullAtEnd(int length)
{
boolean[] arr = new boolean[length];
if (length > 0) {
arr[length - 1] = true;
}
return arr;
}
private static boolean[] consecutiveNullsInMiddle(int length)
{
boolean[] arr = new boolean[length];
if (length >= 4) {
arr[length / 2 - 1] = true;
arr[length / 2] = true;
}
return arr;
}
```
</issue_to_address>
### Comment 3
<location> `core/trino-spi/src/test/java/io/trino/spi/block/TestEncoderUtil.java:49-50` </location>
<code_context>
+ }
+ }
+
+ @AfterAll
+ public static void resetSimd()
+ {
+ EncoderUtil.setSimdSupport(SimdSupport.NONE);
</code_context>
<issue_to_address>
**nitpick (testing):** Nitpick: Consider resetting SimdSupport before each test for isolation.
Resetting SimdSupport before each test, such as with @BeforeEach, will prevent state leakage between tests and improve reliability.
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
core/trino-spi/src/main/java/io/trino/spi/block/EncoderUtil.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/test/java/io/trino/spi/block/TestEncoderUtil.java
Outdated
Show resolved
Hide resolved
Discussed this offline, from here we want to:
|
e657e5a
to
c5eaac9
Compare
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: EC2 Default User.
|
c5eaac9
to
5719c20
Compare
package io.trino.util; | ||
|
||
import com.google.common.base.StandardSystemProperty; | ||
import org.weakref.jmx.$internal.guava.collect.ImmutableSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import path
|
||
import static java.lang.Math.min; | ||
import static java.util.Locale.ENGLISH; | ||
import static org.weakref.jmx.$internal.guava.collect.ImmutableSet.toImmutableSet; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong import path
|
||
private EncoderUtil() {} | ||
|
||
public static void setSimdSupport(SimdSupport simdSupport) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a static initializer here is undesirable, let's defer the decision about whether to SIMD each operation to the caller and just make the static methods available for both scalar and vectorized encoding.
import static java.util.Objects.checkFromIndexSize; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public final class EncoderUtil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this class needs to become public if we remove the setSimdSupport
method, maybe not.
// Bytes | ||
public static void compressBytesWithNulls(SliceOutput sliceOutput, byte[] values, boolean[] isNull, int offset, int length) | ||
{ | ||
if (simd.enableBlockSerdeVectorizedNullSuppression() && simd.supportByteGeneric() && simd.supportByteCompress() && length >= SIMD_OPTIMIZATION_LENGTH_THRESHOLD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's reorganize this logic by:
- Moving
simd.enableBlockSerdeVectorizedNullSuppression() && simd.supportByteGeneric()
(and equivalent methods for other primitive types) into the caller context and have it either call the vectorized or non-vectorized method directly - Move the check for
length >= SIMD_OPTIMIZATION_LENGTH_THRESHOLD
into the beginning of the vectorized method (to call directly into the scalar implementation)
* Implementations should answer true only when the operation is supported | ||
* without emulation on the current CPU/JVM. | ||
*/ | ||
public interface SimdSupport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making this an interface with each CPU architecture defined separately, let's do CPU specific parsing and just emit a simple record class with fields for supported operations, e.g.:
public record SimdSupport(
boolean nativeExpandAndCompressByte,
boolean nativeExpandAndCompressShort,
...
)
51f18a4
to
c583645
Compare
c583645
to
c40b2b0
Compare
Description
This PR add support for vectorized null suppression for block serde using Java SIMD API.
Functionalities added
detect SIMD support from the CPU.
This functionality is essential to prevent regression from happening, even though Java vector API is platform-agonitic, it only provides guarantess for correcntess, but no guarantee for performance improbement. If the JVM is running on a older CPU without decent SIMD support, the Java vector API may fall back to emulated execution instead of real SIMD execution. So we want a extra layer gating make sure that such fall back would not happen.
Currently, we add support for Intel and AMD CPUs, we may extend to support Graviton later if experiments can show speed up on Graviton machines.
add vectorized path for null suppression in block serde
Add vectorized path for null suppression for byte/short/integer/long.
Microbenchmark results are given below.
Microbenchmark on Intel CPU with avx512F support.
Microbenchmark on AMD zen4 CPU with avx512F support.
The reason that the speed up is not the potential maximum speed up(16x for Int, 8x for Long for AVX512) is
Change row length to 8192 in BenchmarkBlockSerde to match real workload case
Since

PAGE_SPLIT_THRESHOLD_IN_BYTES
is 2 * 1024 * 1024 in PageSplitterUtil currently, the row length used in BenchmarkBlockSerde 10_000_000 is too long and doesn't match the real workload case. Profiling shows that under row length 10_000_000, the majority of time on BenchmarkBlockSerde is spent on this array creationAfter the change

Tests:
Next steps
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Summary by Sourcery
Add vectorized SIMD-based null suppression for block serialization using the Java Vector API with dynamic CPU feature detection
New Features:
Enhancements:
Build:
Tests:
Summary by Sourcery
Add vectorized null suppression for block serialization using the Java Vector API with dynamic CPU feature detection, refactor block encodings to leverage the new SIMD path, and update benchmarks and tests to validate correctness.
New Features:
Enhancements:
Tests: