[Enhancement] support iceberg rewrite manifests procedure (backport #68817)#69015
Open
mergify[bot] wants to merge 1 commit intobranch-4.1from
Open
[Enhancement] support iceberg rewrite manifests procedure (backport #68817)#69015mergify[bot] wants to merge 1 commit intobranch-4.1from
mergify[bot] wants to merge 1 commit intobranch-4.1from
Conversation
Signed-off-by: dontknow9179 <clin56322@gmail.com> (cherry picked from commit 56f6a7f)
24 tasks
Contributor
Author
🧪 CI InsightsHere's what we observed from your CI run for a2e260f. 🟢 All jobs passed!But CI Insights is watching 👀 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I'm doing:
Iceberg tables can accumulate many small manifest files over time due to frequent commits and rewrites. This increases metadata size, slows down metadata reads and query planning, and can affect overall query performance. Apache Iceberg provides a
rewrite_manifestsoperation to compact and reorganize these manifests, but StarRocks’ Iceberg integration did not expose it. This PR adds support so users can maintain Iceberg table metadata (compact manifests) directly from StarRocks.What I'm doing:
Rewrite Manifests
rewrite_manifests()(no arguments): rewrites current snapshot’s manifest files by clustering by partition (for better read locality), respectscommit.manifest.target-size-bytes, and caps the number of manifest clusters to avoid OOM.IcebergTable/IcebergTableOperationand wired toALTER TABLE ... EXECUTE rewrite_manifests().Tests
RewriteManifestsProcedureTest) and forAlterTableOperationStmt(e.g.rewrite_manifestsparsing).test_iceberg_rewrite_manifeststhat creates an Iceberg table, inserts data, runs rewrite_data_files() and rewrite_manifests(), and checks manifest counts via $manifests.What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check:
This is an automatic backport of pull request #68817 done by [Mergify](https://mergify.com).