-
Notifications
You must be signed in to change notification settings - Fork 1.7k
out_s3: Add parquet compression type with pure C #10691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
out_s3: Add parquet compression type with pure C #10691
Conversation
WalkthroughThis change introduces conditional support for Apache Parquet compression in the AWS S3 output plugin, leveraging Arrow GLib and Parquet GLib if available. It adds CMake detection for Parquet, new constants and functions for Parquet compression, and updates the S3 plugin to recognize and handle the "parquet" compression type alongside existing options. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant S3Plugin
participant AWSCompress
participant ArrowCompress
participant ParquetGLib
User->>S3Plugin: Configure compression = "parquet"
S3Plugin->>AWSCompress: Request compression function for "parquet"
AWSCompress->>ArrowCompress: Call out_s3_compress_parquet(json, size)
ArrowCompress->>ParquetGLib: Parse JSON to Arrow Table
ArrowCompress->>ParquetGLib: Convert Arrow Table to Parquet buffer
ParquetGLib-->>ArrowCompress: Return Parquet buffer
ArrowCompress-->>AWSCompress: Return compressed buffer and size
AWSCompress-->>S3Plugin: Return compressed buffer
S3Plugin-->>User: Uploads Parquet-compressed data to S3
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (7)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (29)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
85316ab
to
0afb495
Compare
243f704
to
4eaed0d
Compare
4eaed0d
to
e69bbd2
Compare
e7c14b8
to
8eb1493
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/unit-tests.yaml (2)
55-55
:-DFLB_ARROW=On
is always built, but the Arrow/Parquet libs are not conditionally installed
The matrix now always attempts an Arrow build with both compilers, yet the required libraries are installed unconditionally later (Lines 94-101). This means every job – even those that do not pass-DFLB_ARROW=On
– now spends extra time adding the Arrow APT source and downloading large packages.Consider restricting the new flag to a dedicated matrix entry or gate the install step with
if: matrix.flb_option == '-DFLB_ARROW=On'to save several minutes per CI run.
70-73
:exclude
block duplicates compiler filtering already implied by build flag
-DFLB_ARROW=On
is excluded for clang here, but the flag is offered in the matrix only once (Line 55). If future contributors move the flag around, this silent coupling between two lists becomes error-prone.A clearer pattern is to define dedicated strategy axes, e.g.
include: - flb_option: "-DFLB_ARROW=On" compiler: cc: gcc cxx: g++This removes the need for paired
exclude
rows and keeps intent explicit.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/unit-tests.yaml
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: pr-compile-centos-7
- GitHub Check: PR - fuzzing test
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Signed-off-by: Hiroshi Hatake <[email protected]>
Using LIBRARIES does not point into the right place of the apache arrow-glib and parquet-glib libraries. Signed-off-by: Hiroshi Hatake <[email protected]>
67295b6
to
f5b2741
Compare
@cosmo0920 any update here? |
With apache arrow glib parquet library, we're able to support parquet format on out_s3.
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
With leaks on macOS, there's no leaks:
With valgrind:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Bug Fixes