forked from PinkCrow007/arrow-rs
-
Notifications
You must be signed in to change notification settings - Fork 1
Merge arrow-rs/main into shredding-variant-part1 #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
scovich
wants to merge
54
commits into
cmu-db:shredding-variant-part1
Choose a base branch
from
scovich:shredding-variant-part1-merge-main
base: shredding-variant-part1
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Merge arrow-rs/main into shredding-variant-part1 #5
scovich
wants to merge
54
commits into
cmu-db:shredding-variant-part1
from
scovich:shredding-variant-part1-merge-main
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…iant kernel (apache#8201) # Which issue does this PR close? - Closes apache#8060. # Rationale for this change Need to implement `List`, `LargeList` types support for `cast_to_variant` kernel # What changes are included in this PR? Added support for `List`, `LargeList` in `cast_to_variant` kernel # Are these changes tested? Yes, added unit tests # Are there any user-facing changes? Yes, added changes to the `cast_to_variant` kernel --------- Co-authored-by: Konstantin.Tarasov <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
# Which issue does this PR close? - Part of apache#4886 # Rationale for this change This PR introduces benchmark tests for the `AvroWriter` in the `arrow-avro` crate. Adding these benchmarks is essential for tracking the performance of the writer, identifying potential regressions, and guiding future optimizations. # What changes are included in this PR? A new benchmark file, `benches/avro_writer.rs`, is added to the project. This file contains a suite of benchmarks that measure the performance of writing `RecordBatch`es to the Avro format. The benchmarks cover a variety of Arrow data types: - `Boolean` - `Int32` and `Int64` - `Float32` and `Float64` - `Binary` - `Timestamp` (Microsecond precision) - A schema with a mix of the above types These benchmarks are run with varying numbers of rows (100, 10,000, and 1,000,000) to assess performance across different data scales. # Are these changes tested? Yes, this pull request consists entirely of new benchmark tests. Therefore, no separate tests are needed. # Are there any user-facing changes? NA
This method was removed in apache#7824, which introduced an optimized code path for writing bloom filters on little-endian architectures. The method was however still used in the big-endian code-path. Due to the use of `#[cfg(target_endian)]` this went unnoticed in CI. Fixes apache#8207
…che#8177) # Which issue does this PR close? - Closes apache#8063 # Rationale for this change Maps are now cast to `Variant::Object`s # What changes are included in this PR? # Are these changes tested? Yes # Are there any user-facing changes? --------- Co-authored-by: Andrew Lamb <[email protected]>
…he#8105) # Which issue does this PR close? We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. - Closes apache#8091 . # Rationale for this change Implement `VariantArray::value` for some more shredded variants(eg. primitive_conversion/generic_conversion/non_generic_conversion). # What changes are included in this PR? - Extract all `macroRules` to a separate module `type_conversion.rs` - Add a macro for `variant value` # Are these changes tested? Covered by the existing test # Are there any user-facing changes? No
…kernel (apache#8196) # Which issue does this PR close? - Closes apache#8195. # Rationale for this change # What changes are included in this PR? Implement `DataType::Union` for `cast_to_variant` # Are these changes tested? Yes # Are there any user-facing changes? New cast type supported --------- Co-authored-by: Andrew Lamb <[email protected]>
…pache#8206) # Which issue does this PR close? - Closes apache#8205 # Rationale for this change `VariantArrayBuilder` had a very complex choreography with the `VariantBuilder` API, that required lots of manual drop glue to deal with ownership transfers between it and the `VariantArrayVariantBuilder` it delegates the actual work to. Rework the whole thing to use a (now-reusable) `MetadataBuilder` and `ValueBuilder`, with rollbacks largely handled by `ParentState` -- just like the other builders in the parquet-variant crate. # What changes are included in this PR? Five changes (curated as five commits that reviewers may want to examine individually): 1. Make a bunch of parquet-variant builder infrastructure public, so that `VariantArrayBuilder` can access it from the parquet-variant-compute crate. 2. Make `MetadataBuilder` reusable. Its `finish` method appends the bytes of a new serialized metadata dictionary to the underlying buffer and resets the remaining builder state. The builder is thus ready to create a brand new metadata dictionary whose serialized bytes will also be appended to the underlying buffer once finished. 3. Rework `VariantArrayBuilder` to use `MetadataBuilder` and `ValueBuilder`, coordinated via `ParentState`. This is the main feature of the PR and also the most complicated/subtle. 4. Delete now-unused code that had been added previously in order to support the old implementation of `VariantArrayBuilder`. 5. Add missing doc comments for now-public types and methods # Are these changes tested? Existing variant array builder tests cover the change. # Are there any user-facing changes? A lot of builder-related types and methods from the parquet-variant crate are now public.
…g fails (apache#8213) # Which issue does this PR close? - Closes apache#8212 # Rationale for this change In the original code, the bitmap was modified before decoding. Even if decoding fails, the null buffer was modified, leading to bitmap corruption, eventually causing flush to fail. # What changes are included in this PR? This PR fixes the bug where the bitmap was modified before decoding. If there is decoding failure, the bitmap should not be modified but the decode method should be exited gracefully without any side effect. # Are these changes tested? - Added a unit test # Are there any user-facing changes? No.
# Which issue does this PR close? - Closes apache#8152 # Rationale for this change When manipulating existing variant values (unshredding, removing fields, etc), the metadata column is already defined and already contains all necessary field ids. In fact, defining new/different field ids would require rewriting the bytes of those already-encoded variant values. We need a way to build variant values that rely on an existing metadata dictionary. # What changes are included in this PR? * `MetadataBuilder` is now a trait, and most methods that work with metadata builders now take `&mut dyn MetadataBuilder` instead of `&mut MetadataBuilder`. * The old `MetadataBuilder` struct is now `BasicMetadataBuilder` that implements `MetadataBuilder` * Define a `ReadOnlyMetadataBuilder` that wraps a `VariantMetadata` and which also implements `MetadataBuilder` * Update the `try_binary_search_range_by` helper method to be more general, so we can define an efficient `VariantMetadata::get_entry` that returns the field id for a given field name. # Are these changes tested? Existing tests cover the basic metadata builder. New tests added to cover the read-only metadata builder. # Are there any user-facing changes? The renamed `BasicMetadataBuilder` (breaking), the new `MetadataBuilder` trait (breaking), and the new `ReadOnlyMetadataBuilder`.
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 3 to 4. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…he#8210) # Which issue does this PR close? Closes apache#8209 # Rationale for this change In the Field struct definition ``` /// A field within a [`Record`] #[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] pub struct Field<'a> { /// Name of the field within the record #[serde(borrow)] pub name: &'a str, /// Optional documentation for this field #[serde(borrow, default)] pub doc: Option<&'a str>, /// The field's type definition #[serde(borrow)] pub r#type: Schema<'a>, /// Optional default value for this field #[serde(borrow, default)] pub default: Option<&'a str>, } ``` type is of type `Schema` whereas default is of type `str`. The default should be supported for all types (e.g. int, array, map, nested record), so we should make it more lenient. More details on reproduction is mentioned in the Github Issue. # What changes are included in this PR? Relaxation of default type of avro scheam Field. # Are these changes tested? Added a unit test. # Are there any user-facing changes? It affects `pub struct Field` of `arrow-avro` package, but the impact should be minimal as the `default` attribute is not being used.
…_type` (apache#8216) # Which issue does this PR close? None. # Rationale for this change I noticed an error in the doc comment about error conditions of `Field::try_canonical_extension_type`. # What changes are included in this PR? Fixed the doc comment. # Are these changes tested? No. # Are there any user-facing changes? No.
…t` kernel (apache#8215) # Which issue does this PR close? - Closes apache#8194. # Rationale for this change # What changes are included in this PR? Implement `duration` the same as `interval` # Are these changes tested? Yes # Are there any user-facing changes?
…pache#8141) # Which issue does this PR close? - Closes apache#8217 # Rationale for this change When working with shredded variants, we need the ability to copy nested object fields and array elements of one variant to a destination. This is a cheap byte-wise copy that relies on the fact that the new variant being built uses the same metadata dictionary as the source variant it is derived from. # What changes are included in this PR? Define a helper macro that encapsulates the logic for variant appends, now that we have three very similar methods (differing only in their handling of list/object values and their return type). Add new methods: `ValueBuilder::append_variant_bytes`, which is called by new methods `VariantBuilder::append_value_bytes`, `ListBuilder::append_value_bytes`, and `ObjectBuilder::[try_]insert_bytes`. # Are these changes tested? New unit tests # Are there any user-facing changes? The new methods are public. --------- Co-authored-by: Andrew Lamb <[email protected]>
…e#8214) # Which issue does this PR close? - Closes apache#8184 # Rationale for this change # What changes are included in this PR? There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. # Are these changes tested? Yes # Are there any user-facing changes? `Object::finish` doesn't return `Result` anymore --------- Co-authored-by: Andrew Lamb <[email protected]>
Reverts: - apache#8183 Because the related issue was closed: - apache#8181
# Which issue does this PR close? - Closes apache#8243 . # What changes are included in this PR? pin `comfy-table` to release prior to 7.2.0's MSRV bump to 1.85 - included a TODO to unpin after arrow bumps to 1.85 (context FWIW: caught in delta_kernel [MSRV CI](https://github.com/delta-io/delta-kernel-rs/actions/runs/17310376492/job/49143119497)) # Are these changes tested? validated MSRV with cargo-msrv: ```bash # now passes cargo msrv --path arrow-cast/ verify --rust-version 1.84 --all-features ```
# Which issue does this PR close? - Closes apache#8228. # What changes are included in this PR? Add `Variant::as_f16` # Are these changes tested? Added doc tests # Are there any user-facing changes? Added doc for the function --------- Co-authored-by: Matthijs Brobbel <[email protected]>
Updates the requirements on [hashbrown](https://github.com/rust-lang/hashbrown) to permit the latest version. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Which issue does this PR close? The doc for lexsort says it's stable. However, it's an unstable sort. # Rationale for this change Fix the document. # What changes are included in this PR? Fix the document. # Are these changes tested? No need # Are there any user-facing changes? Doc change --------- Co-authored-by: Matthijs Brobbel <[email protected]>
# Which issue does this PR close? \- # Rationale for this change Some services support gRPC compression. Expose this to the CLI client for: - testing - more efficient data transfer over slow internet connections # What changes are included in this PR? CLI argument wiring. # Are these changes tested? No automated tests. I think we can assume that the libraries we use do what they promise to do. But I also verified that this works by inspecting the traffic using Wireshark. # Are there any user-facing changes? They now have more options.
# Which issue does this PR close? - Part of apache#4886 - Extends work initiated in apache#8006 # Rationale for this change This introduces support for Confluent schema registry ID handling in the arrow-avro crate, adding compatibility with Confluent's wire format. These improvements enable streaming Apache Kafka, Redpanda, and Pulsar messages with Avro schemas directly into arrow-rs. # What changes are included in this PR? - Adds Confluent support - Adds initial support for SHA256 and MD5 algorithm types. Rabin remains the default. # Are these changes tested? Yes, existing tests are all passing, and tests for ID handling have been added. Benchmark results show no appreciable changes. # Are there any user-facing changes? - Confluent users need to provide the ID fingerprint when using the `set` method, unlike the `register` method which generates it from the schema on the fly. Existing API behavior has been maintained. - SchemaStore TryFrom now accepts a `&HashMap<Fingerprint, AvroSchema>`, rather than a `&[AvroSchema]` Huge shout out to @jecsand838 for his collaboration on this! --------- Co-authored-by: Connor Sanders <[email protected]>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-python/releases">actions/setup-python's releases</a>.</em></p> <blockquote> <h2>v6.0.0</h2> <h2>What's Changed</h2> <h3>Breaking Changes</h3> <ul> <li>Upgrade to node 24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1164">actions/setup-python#1164</a></li> </ul> <p>Make sure your runner is on version v2.327.1 or later to ensure compatibility with this release. <a href="https://github.com/actions/runner/releases/tag/v2.327.1">See Release Notes</a></p> <h3>Enhancements:</h3> <ul> <li>Add support for <code>pip-version</code> by <a href="https://github.com/priyagupta108"><code>@priyagupta108</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1129">actions/setup-python#1129</a></li> <li>Enhance reading from .python-version by <a href="https://github.com/krystof-k"><code>@krystof-k</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/787">actions/setup-python#787</a></li> <li>Add version parsing from Pipfile by <a href="https://github.com/aradkdj"><code>@aradkdj</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1067">actions/setup-python#1067</a></li> </ul> <h3>Bug fixes:</h3> <ul> <li>Clarify pythonLocation behaviour for PyPy and GraalPy in environment variables by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1183">actions/setup-python#1183</a></li> <li>Change missing cache directory error to warning by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1182">actions/setup-python#1182</a></li> <li>Add Architecture-Specific PATH Management for Python with --user Flag on Windows by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1122">actions/setup-python#1122</a></li> <li>Include python version in PyPy python-version output by <a href="https://github.com/cdce8p"><code>@cdce8p</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1110">actions/setup-python#1110</a></li> <li>Update docs: clarification on pip authentication with setup-python by <a href="https://github.com/priya-kinthali"><code>@priya-kinthali</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1156">actions/setup-python#1156</a></li> </ul> <h3>Dependency updates:</h3> <ul> <li>Upgrade idna from 2.9 to 3.7 in /<strong>tests</strong>/data by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-python/pull/843">actions/setup-python#843</a></li> <li>Upgrade form-data to fix critical vulnerabilities <a href="https://redirect.github.com/actions/setup-python/issues/182">#182</a> & <a href="https://redirect.github.com/actions/setup-python/issues/183">#183</a> by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1163">actions/setup-python#1163</a></li> <li>Upgrade setuptools to 78.1.1 to fix path traversal vulnerability in PackageIndex.download by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1165">actions/setup-python#1165</a></li> <li>Upgrade actions/checkout from 4 to 5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-python/pull/1181">actions/setup-python#1181</a></li> <li>Upgrade <code>@actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-python/pull/1095">actions/setup-python#1095</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/krystof-k"><code>@krystof-k</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-python/pull/787">actions/setup-python#787</a></li> <li><a href="https://github.com/cdce8p"><code>@cdce8p</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-python/pull/1110">actions/setup-python#1110</a></li> <li><a href="https://github.com/aradkdj"><code>@aradkdj</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-python/pull/1067">actions/setup-python#1067</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-python/compare/v5...v6.0.0">https://github.com/actions/setup-python/compare/v5...v6.0.0</a></p> <h2>v5.6.0</h2> <h2>What's Changed</h2> <ul> <li>Workflow updates related to Ubuntu 20.04 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1065">actions/setup-python#1065</a></li> <li>Fix for Candidate Not Iterable Error by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1082">actions/setup-python#1082</a></li> <li>Upgrade semver and <code>@types/semver</code> by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1091">actions/setup-python#1091</a></li> <li>Upgrade prettier from 2.8.8 to 3.5.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1046">actions/setup-python#1046</a></li> <li>Upgrade ts-jest from 29.1.2 to 29.3.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1081">actions/setup-python#1081</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-python/compare/v5...v5.6.0">https://github.com/actions/setup-python/compare/v5...v5.6.0</a></p> <h2>v5.5.0</h2> <h2>What's Changed</h2> <h3>Enhancements:</h3> <ul> <li>Support free threaded Python versions like '3.13t' by <a href="https://github.com/colesbury"><code>@colesbury</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/973">actions/setup-python#973</a></li> <li>Enhance Workflows: Include ubuntu-arm runners, Add e2e Testing for free threaded and Upgrade <code>@action/cache</code> from 4.0.0 to 4.0.3 by <a href="https://github.com/priya-kinthali"><code>@priya-kinthali</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1056">actions/setup-python#1056</a></li> <li>Add support for .tool-versions file in setup-python by <a href="https://github.com/mahabaleshwars"><code>@mahabaleshwars</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1043">actions/setup-python#1043</a></li> </ul> <h3>Bug fixes:</h3> <ul> <li>Fix architecture for pypy on Linux ARM64 by <a href="https://github.com/mayeut"><code>@mayeut</code></a> in <a href="https://redirect.github.com/actions/setup-python/pull/1011">actions/setup-python#1011</a> This update maps arm64 to aarch64 for Linux ARM64 PyPy installations.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/setup-python/commit/e797f83bcb11b83ae66e0230d6156d7c80228e7c"><code>e797f83</code></a> Upgrade to node 24 (<a href="https://redirect.github.com/actions/setup-python/issues/1164">#1164</a>)</li> <li><a href="https://github.com/actions/setup-python/commit/3d1e2d2ca0a067f27da6fec484fce7f5256def85"><code>3d1e2d2</code></a> Revert "Enhance cache-dependency-path handling to support files outside the w...</li> <li><a href="https://github.com/actions/setup-python/commit/65b071217a8539818fdb8b54561bcbae40380a54"><code>65b0712</code></a> Clarify pythonLocation behavior for PyPy and GraalPy in environment variables...</li> <li><a href="https://github.com/actions/setup-python/commit/5b668cf7652160527499ee14ceaff4be9306cb88"><code>5b668cf</code></a> Bump actions/checkout from 4 to 5 (<a href="https://redirect.github.com/actions/setup-python/issues/1181">#1181</a>)</li> <li><a href="https://github.com/actions/setup-python/commit/f62a0e252fe7114e86949abfa6e1e89f85bb38c2"><code>f62a0e2</code></a> Change missing cache directory error to warning (<a href="https://redirect.github.com/actions/setup-python/issues/1182">#1182</a>)</li> <li><a href="https://github.com/actions/setup-python/commit/9322b3ca74000aeb2c01eb777b646334015ddd72"><code>9322b3c</code></a> Upgrade setuptools to 78.1.1 to fix path traversal vulnerability in PackageIn...</li> <li><a href="https://github.com/actions/setup-python/commit/fbeb884f69f0ac1c0257302f62aa524c2824b649"><code>fbeb884</code></a> Bump form-data to fix critical vulnerabilities <a href="https://redirect.github.com/actions/setup-python/issues/182">#182</a> & <a href="https://redirect.github.com/actions/setup-python/issues/183">#183</a> (<a href="https://redirect.github.com/actions/setup-python/issues/1163">#1163</a>)</li> <li><a href="https://github.com/actions/setup-python/commit/03bb6152f4f691b9d64579a1bd791904a083c452"><code>03bb615</code></a> Bump idna from 2.9 to 3.7 in /<strong>tests</strong>/data (<a href="https://redirect.github.com/actions/setup-python/issues/843">#843</a>)</li> <li><a href="https://github.com/actions/setup-python/commit/36da51d563b70a972897150555bb025096d65565"><code>36da51d</code></a> Add version parsing from Pipfile (<a href="https://redirect.github.com/actions/setup-python/issues/1067">#1067</a>)</li> <li><a href="https://github.com/actions/setup-python/commit/3c6f142cc0036d53007e92fa1e327564a4cfb7aa"><code>3c6f142</code></a> update documentation (<a href="https://redirect.github.com/actions/setup-python/issues/1156">#1156</a>)</li> <li>Additional commits viewable in <a href="https://github.com/actions/setup-python/compare/v5...v6">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-node/releases">actions/setup-node's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <h2>What's Changed</h2> <h3>Breaking Changes</h3> <ul> <li>Upgrade action to use node24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li> </ul> <p>Make sure your runner is updated to this version or newer to use this release. v2.327.1 <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></p> <h3>Dependency Upgrades</h3> <ul> <li>Upgrade <code>@octokit/request-error</code> and <code>@actions/github</code> by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1227">actions/setup-node#1227</a></li> <li>Upgrade uuid from 9.0.1 to 11.1.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1273">actions/setup-node#1273</a></li> <li>Upgrade undici from 5.28.5 to 5.29.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1295">actions/setup-node#1295</a></li> <li>Upgrade form-data to bring in fix for critical vulnerability by <a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1332">actions/setup-node#1332</a></li> <li>Upgrade actions/checkout from 4 to 5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/setup-node/pull/1345">actions/setup-node#1345</a></li> </ul> <h3>Enhancement:</h3> <ul> <li>Enhance caching in setup-node with automatic package manager detection by <a href="https://github.com/priya-kinthali"><code>@priya-kinthali</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/priya-kinthali"><code>@priya-kinthali</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1348">actions/setup-node#1348</a></li> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1325">actions/setup-node#1325</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-node/compare/v4...v5.0.0">https://github.com/actions/setup-node/compare/v4...v5.0.0</a></p> <h2>v4.4.0</h2> <h2>What's Changed</h2> <h3>Bug fixes:</h3> <ul> <li>Make eslint-compact matcher compatible with Stylelint by <a href="https://github.com/FloEdelmann"><code>@FloEdelmann</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li> <li>Add support for indented eslint output by <a href="https://github.com/fregante"><code>@fregante</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li> </ul> <h3>Enhancement:</h3> <ul> <li>Support private mirrors by <a href="https://github.com/marco-ippolito"><code>@marco-ippolito</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li> </ul> <h3>Dependency update:</h3> <ul> <li>Upgrade <code>@action/cache</code> from 4.0.2 to 4.0.3 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1262">actions/setup-node#1262</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/FloEdelmann"><code>@FloEdelmann</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/98">actions/setup-node#98</a></li> <li><a href="https://github.com/fregante"><code>@fregante</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1245">actions/setup-node#1245</a></li> <li><a href="https://github.com/marco-ippolito"><code>@marco-ippolito</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-node/pull/1240">actions/setup-node#1240</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-node/compare/v4...v4.4.0">https://github.com/actions/setup-node/compare/v4...v4.4.0</a></p> <h2>v4.3.0</h2> <h2>What's Changed</h2> <h3>Dependency updates</h3> <ul> <li>Upgrade <code>@actions/glob</code> from 0.4.0 to 0.5.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1200">actions/setup-node#1200</a></li> <li>Upgrade <code>@action/cache</code> from 4.0.0 to 4.0.2 by <a href="https://github.com/gowridurgad"><code>@gowridurgad</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1251">actions/setup-node#1251</a></li> <li>Upgrade <code>@vercel/ncc</code> from 0.38.1 to 0.38.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1203">actions/setup-node#1203</a></li> <li>Upgrade <code>@actions/tool-cache</code> from 2.0.1 to 2.0.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a> in <a href="https://redirect.github.com/actions/setup-node/pull/1220">actions/setup-node#1220</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/setup-node/commit/a0853c24544627f65ddf259abe73b1d18a591444"><code>a0853c2</code></a> Bump actions/checkout from 4 to 5 (<a href="https://redirect.github.com/actions/setup-node/issues/1345">#1345</a>)</li> <li><a href="https://github.com/actions/setup-node/commit/b7234cc9fe124f0f4932554b4e5284543083ae7b"><code>b7234cc</code></a> Upgrade action to use node24 (<a href="https://redirect.github.com/actions/setup-node/issues/1325">#1325</a>)</li> <li><a href="https://github.com/actions/setup-node/commit/d7a11313b581b306c961b506cfc8971208bb03f6"><code>d7a1131</code></a> Enhance caching in setup-node with automatic package manager detection (<a href="https://redirect.github.com/actions/setup-node/issues/1348">#1348</a>)</li> <li><a href="https://github.com/actions/setup-node/commit/5e2628c959b9ade56971c0afcebbe5332d44b398"><code>5e2628c</code></a> Bumps form-data (<a href="https://redirect.github.com/actions/setup-node/issues/1332">#1332</a>)</li> <li><a href="https://github.com/actions/setup-node/commit/65beceff8e91358525397bdce9103d999507ab03"><code>65becef</code></a> Bump undici from 5.28.5 to 5.29.0 (<a href="https://redirect.github.com/actions/setup-node/issues/1295">#1295</a>)</li> <li><a href="https://github.com/actions/setup-node/commit/7e24a656e1c7a0d6f3eaef8d8e84ae379a5b035b"><code>7e24a65</code></a> Bump uuid from 9.0.1 to 11.1.0 (<a href="https://redirect.github.com/actions/setup-node/issues/1273">#1273</a>)</li> <li><a href="https://github.com/actions/setup-node/commit/08f58d1471bff7f3a07d167b4ad7df25d5fcfcb6"><code>08f58d1</code></a> Bump <code>@octokit/request-error</code> and <code>@actions/github</code> (<a href="https://redirect.github.com/actions/setup-node/issues/1227">#1227</a>)</li> <li>See full diff in <a href="https://github.com/actions/setup-node/compare/v4...v5">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…he#8179) # Which issue does this PR close? - Closes apache#8178 # Are these changes tested? Yes # Are there any user-facing changes? Can use `variant_get` for shredded numeric types --------- Co-authored-by: Andrew Lamb <[email protected]>
# Which issue does this PR close? - Part of apache#4886 - Follows up on apache#8047 # Rationale for this change When reading Avro into Arrow with a projection or a reader schema that omits some writer fields, we were still decoding those writer‑only fields item‑by‑item. This is unnecessary work and can dominate CPU time for large arrays/maps or deeply nested records. Avro’s binary format explicitly allows fast skipping for arrays/maps by encoding data in blocks: when the count is negative, the next `long` gives the byte size of the block, enabling O(1) skipping of that block without decoding each item. This PR teaches the record reader to recognize and leverage that, and to avoid constructing decoders for fields we will skip altogether. # What changes are included in this PR? **Reader / decoding architecture** - **Skip-aware record decoding**: - At construction time, we now precompute per-record **skip decoders** for writer fields that the reader will ignore. - Introduced a resolved-record path (`RecordResolved`) that carries: - `writer_to_reader` mapping for field alignment, - a prebuilt list of **skip decoders** for fields not present in the reader, - the set of active per-field decoders for the projected fields. - **Codec builder enhancements**: In `arrow-avro/src/codec.rs`, record construction now: - Builds Arrow `Field`s and their decoders only for fields that are read, - Builds `skip_decoders` (via `build_skip_decoders`) for fields to ignore. - **Error handling and consistency**: Kept existing strict-mode behavior; improved internal branching to avoid inconsistent states during partial decodes. **Tests** - **Unit tests (in `arrow-avro/src/reader/record.rs`)** - Added focused tests that exercise the new skip logic: - Skipping writer‑only fields inside **arrays** and **maps** (including negative‑count block skipping and mixed multi‑block payloads). - Skipping nested structures within records to ensure offsets and lengths remain correct for the fields that are read. - Ensured nullability and union handling remain correct when adjacent fields are skipped. - **Integration tests (in `arrow-avro/src/reader/mod.rs`)** - Added end‑to‑end test using `avro/alltypes_plain.avro` to validate that projecting a subset of fields (reader schema omits some writer fields) both: - Produces the correct Arrow arrays for the selected fields, and - Avoids decoding skipped fields (validated indirectly via behavior and block boundaries). - The test covers compressed and uncompressed variants already present in the suite to ensure behavior is consistent across codecs. # Are these changes tested? - **New unit tests** cover: - Fast skipping for arrays/maps using negative block counts and block sizes (per Avro spec). - Nested and nullable scenarios to ensure correct offsets, validity bitmaps, and flush behavior when adjacent fields are skipped. - **New integration test** in `reader/mod.rs`: - Reads `avro/alltypes_plain.avro` with a reader schema that omits several writer fields and asserts the resulting `RecordBatch` matches the expected arrays while exercising the skip path. - Existing promotion, enum, decimal, fixed, and union tests continue to pass, ensuring no regressions in unrelated areas. # Are there any user-facing changes? N/A since `arrow-avro` is not public yet.
Bumps [actions/labeler](https://github.com/actions/labeler) from 5.0.0 to 6.0.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/labeler/releases">actions/labeler's releases</a>.</em></p> <blockquote> <h2>v6.0.0</h2> <h2>What's Changed</h2> <ul> <li>Add workflow file for publishing releases to immutable action package by <a href="https://github.com/jcambass"><code>@jcambass</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/802">actions/labeler#802</a></li> </ul> <h3>Breaking Changes</h3> <ul> <li>Upgrade Node.js version to 24 in action and dependencies <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/891">actions/labeler#891</a> Make sure your runner is on version v2.327.1 or later to ensure compatibility with this release. <a href="https://github.com/actions/runner/releases/tag/v2.327.1">Release Notes</a></li> </ul> <h3>Dependency Upgrades</h3> <ul> <li>Upgrade eslint-config-prettier from 9.0.0 to 9.1.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/711">actions/labeler#711</a></li> <li>Upgrade eslint from 8.52.0 to 8.55.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/720">actions/labeler#720</a></li> <li>Upgrade <code>@types/jest</code> from 29.5.6 to 29.5.11 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/719">actions/labeler#719</a></li> <li>Upgrade <code>@types/js-yaml</code> from 4.0.8 to 4.0.9 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/718">actions/labeler#718</a></li> <li>Upgrade <code>@typescript-eslint/parser</code> from 6.9.0 to 6.14.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/717">actions/labeler#717</a></li> <li>Upgrade prettier from 3.0.3 to 3.1.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/726">actions/labeler#726</a></li> <li>Upgrade eslint from 8.55.0 to 8.56.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/725">actions/labeler#725</a></li> <li>Upgrade <code>@typescript-eslint/parser</code> from 6.14.0 to 6.19.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/745">actions/labeler#745</a></li> <li>Upgrade eslint-plugin-jest from 27.4.3 to 27.6.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/744">actions/labeler#744</a></li> <li>Upgrade <code>@typescript-eslint/eslint-plugin</code> from 6.9.0 to 6.20.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/750">actions/labeler#750</a></li> <li>Upgrade prettier from 3.1.1 to 3.2.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/752">actions/labeler#752</a></li> <li>Upgrade undici from 5.26.5 to 5.28.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/757">actions/labeler#757</a></li> <li>Upgrade braces from 3.0.2 to 3.0.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/789">actions/labeler#789</a></li> <li>Upgrade minimatch from 9.0.3 to 10.0.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/805">actions/labeler#805</a></li> <li>Upgrade <code>@actions/core</code> from 1.10.1 to 1.11.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/811">actions/labeler#811</a></li> <li>Upgrade typescript from 5.4.3 to 5.7.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/819">actions/labeler#819</a></li> <li>Upgrade <code>@typescript-eslint/parser</code> from 7.3.1 to 8.17.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/824">actions/labeler#824</a></li> <li>Upgrade prettier from 3.2.5 to 3.4.2 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/825">actions/labeler#825</a></li> <li>Upgrade <code>@types/jest</code> from 29.5.12 to 29.5.14 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/827">actions/labeler#827</a></li> <li>Upgrade eslint-plugin-jest from 27.9.0 to 28.9.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/832">actions/labeler#832</a></li> <li>Upgrade ts-jest from 29.1.2 to 29.2.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/831">actions/labeler#831</a></li> <li>Upgrade <code>@vercel/ncc</code> from 0.38.1 to 0.38.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/830">actions/labeler#830</a></li> <li>Upgrade typescript from 5.7.2 to 5.7.3 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/835">actions/labeler#835</a></li> <li>Upgrade eslint-plugin-jest from 28.9.0 to 28.11.0 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/839">actions/labeler#839</a></li> <li>Upgrade undici from 5.28.4 to 5.28.5 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/842">actions/labeler#842</a></li> <li>Upgrade <code>@octokit/request-error</code> from 5.0.1 to 5.1.1 by <a href="https://github.com/dependabot"><code>@dependabot</code></a>[bot] in <a href="https://redirect.github.com/actions/labeler/pull/846">actions/labeler#846</a></li> </ul> <h3>Documentation changes</h3> <ul> <li>Add note regarding <code>pull_request_target</code> to README.md by <a href="https://github.com/silverwind"><code>@silverwind</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/669">actions/labeler#669</a></li> <li>Update readme with additional examples and important note about <code>pull_request_target</code> event by <a href="https://github.com/IvanZosimov"><code>@IvanZosimov</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/721">actions/labeler#721</a></li> <li>Document update - permission section by <a href="https://github.com/harithavattikuti"><code>@harithavattikuti</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/840">actions/labeler#840</a></li> <li>Improvement in documentation for pull_request_target event usage in README by <a href="https://github.com/suyashgaonkar"><code>@suyashgaonkar</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/871">actions/labeler#871</a></li> <li>Fix broken links in documentation by <a href="https://github.com/suyashgaonkar"><code>@suyashgaonkar</code></a> in <a href="https://redirect.github.com/actions/labeler/pull/822">actions/labeler#822</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/silverwind"><code>@silverwind</code></a> made their first contribution in <a href="https://redirect.github.com/actions/labeler/pull/669">actions/labeler#669</a></li> <li><a href="https://github.com/Jcambass"><code>@Jcambass</code></a> made their first contribution in <a href="https://redirect.github.com/actions/labeler/pull/802">actions/labeler#802</a></li> <li><a href="https://github.com/suyashgaonkar"><code>@suyashgaonkar</code></a> made their first contribution in <a href="https://redirect.github.com/actions/labeler/pull/822">actions/labeler#822</a></li> <li><a href="https://github.com/HarithaVattikuti"><code>@HarithaVattikuti</code></a> made their first contribution in <a href="https://redirect.github.com/actions/labeler/pull/840">actions/labeler#840</a></li> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/labeler/pull/891">actions/labeler#891</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/labeler/commit/f1a63e87db0c6baf19c5713083f8d00d789ca184"><code>f1a63e8</code></a> Update Node.js version to 24 in action and dependencies (<a href="https://redirect.github.com/actions/labeler/issues/891">#891</a>)</li> <li><a href="https://github.com/actions/labeler/commit/b0a1180683c9f17424de4d71c044bea4c7b9bc7c"><code>b0a1180</code></a> Bump <code>@octokit/request-error</code> from 5.0.1 to 5.1.1 (<a href="https://redirect.github.com/actions/labeler/issues/846">#846</a>)</li> <li><a href="https://github.com/actions/labeler/commit/110d44140c9195b853f2f24044bbfed8f4968efb"><code>110d441</code></a> Update README.md (<a href="https://redirect.github.com/actions/labeler/issues/871">#871</a>)</li> <li><a href="https://github.com/actions/labeler/commit/bee50fefe18762fad67754b2f3bfff2c8082ebb8"><code>bee50fe</code></a> Bump undici from 5.28.4 to 5.28.5 (<a href="https://redirect.github.com/actions/labeler/issues/842">#842</a>)</li> <li><a href="https://github.com/actions/labeler/commit/6463cdb00ee92c05bec55dffc4e1fce250301945"><code>6463cdb</code></a> Bump eslint-plugin-jest from 28.9.0 to 28.11.0 (<a href="https://redirect.github.com/actions/labeler/issues/839">#839</a>)</li> <li><a href="https://github.com/actions/labeler/commit/c209686724ee12fcc5e6294d1d569b91f86fa691"><code>c209686</code></a> Bump typescript from 5.7.2 to 5.7.3 (<a href="https://redirect.github.com/actions/labeler/issues/835">#835</a>)</li> <li><a href="https://github.com/actions/labeler/commit/5184940b544b0096088a7b42d1b8a551003d9eb1"><code>5184940</code></a> Bump <code>@vercel/ncc</code> from 0.38.1 to 0.38.3 (<a href="https://redirect.github.com/actions/labeler/issues/830">#830</a>)</li> <li><a href="https://github.com/actions/labeler/commit/3629d5568b59204f18786372f6d740d649719488"><code>3629d55</code></a> Document update - permission section (<a href="https://redirect.github.com/actions/labeler/issues/840">#840</a>)</li> <li><a href="https://github.com/actions/labeler/commit/d24f7f3731b2a06433c0bccc364d560c5329c48f"><code>d24f7f3</code></a> Bump ts-jest from 29.1.2 to 29.2.5 (<a href="https://redirect.github.com/actions/labeler/issues/831">#831</a>)</li> <li><a href="https://github.com/actions/labeler/commit/425a1f14222185c7500cf43245beafe96356561d"><code>425a1f1</code></a> Bump eslint-plugin-jest from 27.9.0 to 28.9.0 (<a href="https://redirect.github.com/actions/labeler/issues/832">#832</a>)</li> <li>Additional commits viewable in <a href="https://github.com/actions/labeler/compare/v5.0.0...v6.0.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Resolves conflicts between PR 8166 (shredding support) and PR 8179 (multi-type support): - Preserves PR 8179's comprehensive multi-type support for all numeric primitives - Keeps PR 8166's superior row builder architecture and shredding support - Integrates both test suites for complete coverage - Maintains enhanced path parsing from PR 8166 The merge successfully combines: - Multi-type variant_get support (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64, Float16, Float32, Float64) - Advanced shredding capabilities with row builder approach - Comprehensive test coverage from both PRs
Hmm, the diff is a real mess. I'm not sure it would actually make sense to merge this directly into your branch, because it probably wouldn't register as a proper merge commit. |
# Which issue does this PR close? - Closes apache#7173. # Rationale for this change Ability to round-trip timezone information. # What changes are included in this PR? Impl `Display` for `Tz` # Are these changes tested? A simple test that strings round trip. # Are there any user-facing changes? New API
# Which issue does this PR close? - Closes apache#8273 . # Rationale for this change When working with the library using encryption, we have sometimes found it necessary to modify an existing set of `WriterProperties` on a per-file basis to set specific encryption properties. More generally, others may need to use an existing set of `WriterProperties` as a template and modify the properties. I have implemented this feature by adding an `into_builder` method, which appears to be the standard approach in other parts of the library. # Are these changes tested? Yes, `test_writer_properties_builder` has been updated to add a round-trip test for `into_builder`. # Are there any user-facing changes? Yes. `WriterProperties` now has a new `into_builder` method. --------- Co-authored-by: Andrew Lamb <[email protected]>
# Which issue does this PR close? - Part of apache#5854. # Rationale for this change Backport changes to allow apples-to-apples comparison of thrift decoding # What changes are included in this PR? Adds a page header benchmark and updates bench names to match those in feature branch. # Are these changes tested? No tests needed...only changes to benchmark # Are there any user-facing changes? No
Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 8. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/labeler](https://github.com/actions/labeler) from 6.0.0 to 6.0.1. Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
# Which issue does this PR close? - Closes apache#8261. # Rationale for this change Add same API between sync and async API # What changes are included in this PR? There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. # Are these changes tested? Add test_async_arrow_group_writer # Are there any user-facing changes? Yes, add two public function get_column_writers, append_row_group for AsyncArrowWrite
# Which issue does this PR close? - Closes apache#8283. # Rationale for this change Add the `Variant::as_u*` functions` # Are these changes tested? Added doc tests # Are there any user-facing changes? No
# Which issue does this PR close? - Closes apache#8234. # Rationale for this change # What changes are included in this PR? - Grouping related data types together (e.g., numeric types, temporal types). - Extracting large code snippets from match branches into helper functions. - Reordering tests to align with the data type order. # Are these changes tested? Covered by existing tests # Are there any user-facing changes? N/A
) # Which issue does this PR close? - Part of apache#4886 - Follows up on apache#8047 # Rationale for this change Avro `enum` values are **encoded by index** but are **semantically identified by symbol name**. During schema evolution it is legal for the writer and reader to use different enum symbol *orders* so long as the **symbol set is compatible**. The Avro specification requires that, when resolving a writer enum against a reader enum, the value be mapped **by symbol name**, not by the writer’s numeric index. If the writer’s symbol is not present in the reader’s enum and the reader defines a default, the default is used; otherwise it is an error. # What changes are included in this PR? **Core changes** - Implement **writer to reader enum symbol remapping**: - Build a fast lookup table at schema resolution time from **writer enum index to reader enum index** using symbol **names**. - Apply this mapping during decode so the produced Arrow dictionary keys always reference the **reader’s** symbol order. - If a writer symbol is not found in the reader enum, surface a clear error. # Are these changes tested? Yes. This PR adds comprehensive **unit tests** for enum mapping in `reader/record.rs` and a **real‑file integration test** in `reader/mod.rs` using `avro/simple_enum.avro`. # Are there any user-facing changes? N/A due to `arrow-avro` not being public yet.
# Which issue does this PR close? \- # Rationale for this change This is apache#4875 now that the upstream changes are available. Allows analysis of TLS traffic with an external tool like Wireshark. See https://wiki.wireshark.org/TLS#using-the-pre-master-secret # What changes are included in this PR? New flag that opts into into the standard `SSLKEYLOGFILE` handling that other libraries and browsers support. # Are these changes tested? Not automatic test, but I did validate that setting the flag AND the env variable emits a log file that is successfully used by Wireshark to decrypt the traffic. # Are there any user-facing changes? Mostly none for normal users, but might be helpful for developers.
# Rationale for this change Update the docstring from function write() in struct Writer to reflect that we write only one RecordBatch at a time as opposed to a vector of record batches. # What changes are included in this PR? Just the comment doc string as above # Are these changes tested? yes # Are there any user-facing changes? No --------- Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Matthijs Brobbel <[email protected]>
# Which issue does this PR close? - Closes apache#8294. # Rationale for this change The .NET implementation is extracted to apache/arrow-dotnet from apache/arrow. apache/arrow will remove `csharp/` eventually. So we should use apache/arrow-dotnet for integration test. # What changes are included in this PR? * Set `ARCHERY_INTEGRATION_WITH_DOTNET=1` to use the .NET implementation * Checkout apache/arrow-dotnet # Are these changes tested? Yes. # Are there any user-facing changes? No.
…pache#8257) # Which issue does this PR close? - Closes apache#8256 . # Rationale for this change Do not compress v2 data page when compress is bad quality ( compressed size is greater or equal to uncompressed_size ) # What changes are included in this PR? Discard compression when it's too large # Are these changes tested? Covered by existing # Are there any user-facing changes? No
# Which issue does this PR close? - Part of apache#4886 # Rationale for this change This refactor streamlines the `arrow-avro` writer by introducing a single, schema‑driven `RecordEncoder` that plans writes up front and encodes rows using consistent, explicit rules for nullability and type dispatch. It reduces duplication in nested/struct/list handling, makes the order of Avro union branches (null‑first vs null‑second) an explicit choice, and aligns header schema generation with value encoding. This should improve correctness (especially for nested optionals), make behavior easier to reason about, and pave the way for future optimizations. # What changes are included in this PR? **High‑level:** * Introduces a unified, schema‑driven `RecordEncoder` with a builder that walks the Avro record in Avro order and maps each field to its Arrow column, producing a reusable write plan. The encoder covers scalars and nested types (struct, (large) lists, maps, strings/binaries). * Applies a single model of **nullability** throughout encoding, including nested sites (list items, fixed‑size list items, map values), and uses explicit union‑branch indices according to the chosen order. **API and implementation details:** * **Writer / encoder refactor** * Replaces the previous per‑column/child encoding paths with a **`FieldPlan`** tree (variants for `Scalar`, `Struct { … }`, and `List { … }`) and per‑site `nullability` carried from the Avro schema. * Adds encoder variants for `LargeBinary`, `Utf8`, `Utf8Large`, `List`, `LargeList`, and `Struct`. * Encodes union branch indices with `write_optional_index` (writes `0x00/0x02` according to Null‑First/Null‑Second), replacing the old branch write. * **Schema generation & metadata** * Moves the **`Nullability`** enum to `schema.rs` and threads it through schema generation and writer logic. * Adds `AvroSchema::from_arrow_with_options(schema, Option<Nullability>)` to either reuse embedded Avro JSON or build new Avro JSON that **honors the requested null‑union order at all nullable sites**. * Adds `extend_with_passthrough_metadata` so Arrow schema metadata is copied into Avro JSON while skipping Avro‑reserved and internal Arrow keys. * Introduces helpers like `wrap_nullable` and `arrow_field_to_avro_with_order` to apply ordering consistently for arrays, fixed‑size lists, maps, structs, and unions. * **Format and glue** * Simplifies `writer/format.rs` by removing the `EncoderOptions` plumbing from the OCF format; `write_long` remains exported for header writing. # Are these changes tested? Yes. * Adds focused unit tests in `writer/encoder.rs` that verify scalar and string/binary encodings (e.g., Binary/LargeBinary, Utf8/LargeUtf8) and validate length/branch encoding primitives used by the writer. * Round trip integration tests that validate List and Struct decoding in `writer/mod.rs`. * Adjusts existing schema tests (e.g., decimal metadata expectations) to align with the new schema/metadata handling. # Are there any user-facing changes? N/A because arrow-avro is not public yet. --------- Co-authored-by: Ryan Johnson <[email protected]> Co-authored-by: Matthijs Brobbel <[email protected]>
# Which issue does this PR close? - Part of apache#4886 # Rationale for this change Apache Avro’s `decimal` logical type annotates either `bytes` or `fixed` and carries `precision` and `scale`. Implementations should reject invalid combinations such as `scale > precision`, and the underlying bytes are the two’s‑complement big‑endian representation of the unscaled integer. On the Arrow side, Rust now exposes first‑class `Decimal32`, `Decimal64`, `Decimal128`, and `Decimal256` data types with documented maximum precisions (9, 18, 38, 76 respectively). Until now, `arrow-avro` decoded all Avro decimals to 128/256‑bit Arrow decimals, even when a narrower type would suffice. # What changes are included in this PR? **`arrow-avro/src/codec.rs`** * Map `Codec::Decimal(precision, scale, _size)` to Arrow’s `Decimal32`/`64`/`128`/`256` **by precision**, preferring the narrowest type (≤9→32, ≤18→64, ≤38→128, otherwise 256). * Strengthen decimal attribute parsing: * Error if `scale > precision`. * Error if `precision` exceeds Arrow’s maximum (Decimal256). * If Avro uses `fixed`, check that declared `precision` fits the byte width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76). * Update docstring of `Codec::Decimal` to mention `Decimal32`/`64`. **`arrow-avro/src/reader/record.rs`** * Add `Decoder::Decimal32` and `Decoder::Decimal64` variants with corresponding builders (`Decimal32Builder`, `Decimal64Builder`). * Builder selection: * If Avro uses **fixed**: choose by size (≤4→Decimal32, ≤8→Decimal64, ≤16→Decimal128, ≤32→Decimal256). * If Avro uses **bytes**: choose by declared precision (≤9/≤18/≤38/≤76). * Implement decode paths that sign‑extend Avro’s two’s‑complement payload to 4/8 bytes and append values to the new builders; update `append_null`/`flush` for 32/64‑bit decimals. **`arrow-avro/src/reader/mod.rs` (tests)** * Expand `test_decimal` to assert that: * bytes‑backed decimals with precision 4 map to `Decimal32`; precision 10 map to `Decimal64`; * legacy fixed\[8] decimals map to `Decimal64`; * fixed\[16] decimals map to `Decimal128`. * Add a nulls path test for bytes‑backed `Decimal32`. # Are these changes tested? Yes. Unit tests under `arrow-avro/src/reader/mod.rs` construct expected `Decimal32Array`/`Decimal64Array`/`Decimal128Array` with `with_precision_and_scale`, and compare against batches decoded from Avro files (including legacy fixed and bytes‑backed cases). The tests also exercise small batch sizes to cover buffering paths; a new Avro data file is added for higher‑width decimals. New Avro test file details: - test/data/int256_decimal.avro # bytes logicalType: decimal(precision=76, scale=10) - test/data/fixed256_decimal.avro # fixed[32] logicalType: decimal(precision=76, scale=10) - test/data/fixed_length_decimal_legacy_32.avro # fixed[4] logicalType: decimal(precision=9, scale=2) - test/data/int128_decimal.avro # bytes logicalType: decimal(precision=38, scale=2) These new Avro test files were created using this script: https://gist.github.com/jecsand838/3890349bdb33082a3e8fdcae3257eef7 There is also an arrow-testing PR for these new files: apache/arrow-testing#112 # Are there any user-facing changes? N/A due to `arrow-avro` not being public.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Merging apache#8166 with upstream main was a bit hairy because of strong logical conflicts with apache#8179.
Hopefully this helps unblock the PR.