fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)] #227

elad-yosifon · 2025-07-16T23:03:57Z

fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)]

Copilot

Pull Request Overview

This PR adds support for skip_field in SerializeStruct to prevent index-out-of-bounds panics when using #[serde(skip_serializing_if)], and introduces tests covering various skip scenarios.

Implemented skip_field to increment the item counter for skipped fields
Added tests for skipping the first, middle, last, and multiple struct fields
Ensures no panic occurs when serializing with skipped optional fields

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
avro/tests/avro-rs-226.rs	New tests for skip-serializing-if behavior in structs
avro/src/ser_schema.rs	Added `skip_field` method to `SerializeStruct` impl

Comments suppressed due to low confidence (2)

avro/tests/avro-rs-226.rs:6

[nitpick] The test function names reference issue 225 but the PR fixes issue 226. Consider renaming these to avro_rs_226_... for consistency.

fn avro_rs_225_index_out_of_bounds_with_serde_skip_serializing_skip_middle_field() -> TestResult {

avro/tests/avro-rs-226.rs:22

These tests only ensure no panic occurs but don’t verify the serialized output. Consider adding assertions to check that only the expected fields are serialized.

    writer.into_inner()?;

avro/src/ser_schema.rs

martin-g · 2025-07-17T06:49:37Z

I applied the suggestions by Copilot (compare the serialized record with the deserialized one) and the assertion fails ... The Some(1) is lost ... I'll try to debug it soon!

martin-g · 2025-07-17T12:16:19Z

This won't work with the current version of avro-rs.

With your PR the field is properly skipped, i.e. nothing is serialized for this field.
But Avro's Reader uses the T::schema() to lead the deserialization of the data and wrongly reads the y's value into x field and z's value into y.

For full roundtrip we will need a Serde-driven deserialization too.
ser.rs and de.rs are Schema-driven impls.
ser_schema.rs is Serde-driven serialization. We will need de_schema.rs (or some better names) for Serde-driven impl.

@jdarais Do you agree with me here ?

elad-yosifon · 2025-07-21T18:41:25Z

@martin-g any ETA on this? I do understand that my PR is not fixing anything.

My current situation is that I have a struct that derives serde::Serialize, and I need to use the same struct for CSV, JSON, and AVRO. Since I am trying to avoid redundant fields (mostly for JSON), I cannot remove the skip_serializing_if directive AFAIK.

Is there a workaround you can suggest?

martin-g · 2025-07-21T19:25:58Z

@martin-g any ETA on this?

Dunno.
I started working on it at #237
Feel free to send PRs against my branch!

…ializing_if = "Option::is_none")]

…d one Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

… field We need to serialize something, otherwise the deserialization does not know that something has been skipped and tries to read a value according to the schema Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g · 2025-07-23T07:23:16Z

@elad-yosifon While working on the serde-driven deserialization I've realized that skip_field() needs to serialize something, otherwise the deserialization uses the wrong Avro Schema of the next field.
So I've added logic to use the RecordField's default value (an Option).
Now only the multiple skipped fields test fails due to the skip attribute. For some reason Serde does not call skip_field() for this field ...

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

elad-yosifon · 2025-07-23T07:26:59Z

@martin-g thanks for the update.
This seems odd, but at least the test caught it :)

Are you done with the debugging, or still at it?

martin-g · 2025-07-23T07:28:56Z

I have some other things to do now but I will continue debugging it later.
Any help/hints why Serde does not call the method are welcome!

elad-yosifon · 2025-07-23T07:31:16Z

@martin-g maybe it is a niche optimization, as all of the NONE optionals are at the tail, so one might think there is no need to indicate the skip.

martin-g · 2025-07-23T07:54:03Z

The y field is tagged with skip_serializing_if and skip. The next field (z) has just skip_serializing_if

https://github.com/apache/avro-rs/pull/227/files#diff-d3a12aeda34e86ab5d302f29a1673cef4fd727f5ad5d9bab18380817a88a47d5R90-R94

The skip_field() method is called directly for z!
Removing skip_serializing_if for y does not change anything.

martin-g · 2025-07-23T11:28:29Z

skip_serializing_if leads to a call to skip_field!
But skip_serializing (without the _if!) does not!
skip attribute just sets skip_serializing and skip_deserializing

Serde's `skip` breaks it because it does not notify us when `#[serde(skip_serialize)]` is used. Drop the support for Struct tuple (e.g. `struct S(A, B, C)` - it is hard to map it to Avro record schema. It should still work for Avro array. Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g · 2025-07-23T13:07:45Z

@elad-yosifon I'd be OK to merge this PR in its current state if it is enough for your use case.
It supports skip_serializing_if but it does not support skip_serializing/skip!
The latter will probably be supported once #237 is finished (but it is a ton of work...).

elad-yosifon · 2025-07-23T13:30:46Z

That’s good for now 👍

…

On Wed, 23 Jul 2025 at 16:08 Martin Grigorov ***@***.***> wrote: *martin-g* left a comment (apache/avro-rs#227) <#227 (comment)> @elad-yosifon <https://github.com/elad-yosifon> I'd be OK to merge this PR in its current state if it is enough for your use case. It supports skip_serializing_if but it does not support skip_serializing/ skip! The latter will probably be supported once #237 <#237> is finished (but it is a ton of work...). — Reply to this email directly, view it on GitHub <#227 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAWZ45CIUXPDYOTJ5SHKR6L3J6CLPAVCNFSM6AAAAACBWGU2ECVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCMBYGE4TKNJQG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

jdarais · 2025-07-24T15:29:03Z

Sorry about the delayed response, I've been traveling. On this question:

For full roundtrip we will need a Serde-driven deserialization too.
ser.rs and de.rs are Schema-driven impls.
ser_schema.rs is Serde-driven serialization. We will need de_schema.rs (or some better names) for Serde-driven impl.

I was looking into creating a serde-driven deserializer, but in the end it seemed easier to first deserialize into an avro Value, which provides some flexibility that helps with schema resolution (if the reader schema is different from the writer schema,) and then use the existing de.rs implementation to convert from Value to the final type.

I feel like this approach should still work for a round-trip of serialization and deserialization. If it doesn't then I think that would mean that either the serde-driven serialization or the schema-driven deserialization has a bug, or simply that the default value used for serialization, (provided by the schema,) is different from the default value used for deserialization, (which as far as I can tell uses Default::default and bypasses deserialization logic altogether.)

In general, the skip_serializing_if attribute seems to be at odds with serialization to Avro, since Avro structs give no accommodation for simply skipping serialization of a field: all fields of a struct must be present. (Using the default value for the field provided by the schema is a reasonable workaround, since that simulates omiting the field, though per the spec, the default value is "only used when reading instances that lack the field for schema evolution purposes. The presence of a default value does not make the field optional at encoding time." Also, with the changes from this PR, if the default value is not present, then the struct will be serialized incorrectly.)

Using the skip attribute would be easier to use with Avro since you can just ensure that the skipped field isn't included in the Avro schema used to write.

Now that I think about it, I'm guessing that skip_serializing_if is mostly useful for removing attributes that are just null or the default value so they don't take up unnecessary space in the serialized payload. It might work to just have the skip_field implementation do the exact same thing as serialize_field, effectively always including the field, since you're not allowed to omit the field anyway.

On a different topic: I see from the PR that there was some logic in there before that ensured that struct fields were written in the correct order, (they must be written in the order in which they appear in the schema,) even if the order of the fields in the rust struct is different from what's in the schema. It'd be nice to get that back in. (I realize there should have been a test for it.)

martin-g · 2025-07-25T07:01:55Z

Hi @jdarais !

I was looking into creating a serde-driven deserializer, but in the end it seemed easier to first deserialize into an avro Value, which provides some flexibility that helps with schema resolution (if the reader schema is different from the writer schema,) and then use the existing de.rs implementation to convert from Value to the final type.

The problem here is that the users expect that the schema-driven deserialization in de.rs should take into account the Serde attributes, like the skip ones. But maybe as you said we just have to ignore them. Otherwise we will face issues later with attributes like flatten, remote, etc.
But in that case I have no good answer to "I want to use the same serde mechanism for JSON, CSV, Avro, ...) (#227 (comment)).

Using the skip attribute would be easier to use with Avro since you can just ensure that the skipped field isn't included in the Avro schema used to write.

Supporting skip is OK.
But what to do with skip_serializing and skip_deserialing ?! We will need to have a field for such fields in the Avro Schema but Serde does not call skip_field() when skip_serializing is used ...

It might work to just have the skip_field implementation do the exact same thing as serialize_field, effectively always including the field, since you're not allowed to omit the field anyway.

We kinda do this at the moment.

fn serialize_field<T>(&mut self, key: &'static str, value: &T)
fn skip_field(&mut self, key: &'static str)
skip_field does not provide the value, so we use the default from the Avro Schema. This might need some more work! At the moment it is treated as an Option but probably it should be unpacked.

On a different topic: I see from the PR that there was some logic in there before that ensured that struct fields were written in the correct order, (they must be written in the order in which they appear in the schema,) even if the order of the fields in the rust struct is different from what's in the schema. It'd be nice to get that back in. (I realize there should have been a test for it.)

You are totally right here!
It should work fine when the schema is derived but it will break bad at deserialization time if the schema is created manually!
For now we can add a note to the documentation!

I removed all logic that didn't break any test :-/
It could be returned with the respective unit tests!

#227 (comment) Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g · 2025-07-25T07:16:28Z

It should work fine when the schema is derived but it will break bad at deserialization time if the schema is created manually! For now we can add a note to the documentation!

#242

…#242) #227 (comment) Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

jdarais · 2025-07-26T04:46:40Z

Ah, those are some good points...

Supporting skip is OK.
But what to do with skip_serializing and skip_deserialing ?! We will need to have a field for such fields in the Avro Schema but Serde does not call skip_field() when skip_serializing is used ...

I think skip_serializing and skip_deserializing are also doable. Since the Serializer is the one with the schema, the schema is decoupled from the struct implementing Serialize, so you just have to make sure that the schema you use to serialize doesn't include the field with the skip_serializing attribute. (Of course, reading the value back would require a different schema that includes the field 🙁 but it is doable with different read and write schemas. If you use skip_serializing but not skip_deserializing, then maybe that's just what you're signing yourself up for.)

skip_deserializing should work more easily, since if we first read the value into an avro Value using the schema, then when we convert from Value to the struct using the de.rs implementation, it should be fine to ignore the value since doing so won't cause issues with parsing the rest of the record in the same way that ignoring a field does when serializing.

skip_field does not provide the value, so we use the default from the Avro Schema. This might need some more work! At the moment it is treated as an Option but probably it should be unpacked.

Ah, that's a very good point. That's a bummer that the value isn't provided in skip_field. Yeah, if the default value is None (meaning no default value was specified) then we're kind of stuck, since there's no way to know what should go in that field. Might be good to just return an Err in that case.

martin-g requested a review from Copilot July 17, 2025 03:44

Copilot AI reviewed Jul 17, 2025

View reviewed changes

avro/src/ser_schema.rs Show resolved Hide resolved

Elad Yosifon and others added 3 commits July 23, 2025 10:05

fix apache#226: Index-Out-Of-Bounds panic when using #[serde(skip_ser…

cec6e27

…ializing_if = "Option::is_none")]

Issue apache#225 - Compare the deserialized result with the serialize…

b150d4d

…d one Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g force-pushed the bugfix-226 branch from 1cb237d to 4a19a1b Compare July 23, 2025 07:25

fmt

b8c9562

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g added 2 commits July 23, 2025 15:51

clippy + fmt

2a1cf2f

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

Add ASLv2 header for the new IT test

9db884b

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g merged commit 673bec2 into apache:main Jul 24, 2025
20 checks passed

martin-g added a commit that referenced this pull request Jul 25, 2025

doc: Document that the fields' order is important for "the serde way"

b0ab323

#227 (comment) Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

martin-g mentioned this pull request Jul 25, 2025

doc: Document that the fields' order is important for "the serde way" #242

Merged

martin-g added a commit that referenced this pull request Jul 25, 2025

doc: Document that the fields' order is important for "the serde way" (…

70867f1

…#242) #227 (comment) Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>

jdarais mentioned this pull request Jul 26, 2025

Turn skip_field into a skip_field_if_possible serde-rs/serde#2012

Open

fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)] #227

fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)] #227

Uh oh!

Conversation

elad-yosifon commented Jul 16, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

martin-g commented Jul 17, 2025

Uh oh!

martin-g commented Jul 17, 2025

Uh oh!

elad-yosifon commented Jul 21, 2025

Uh oh!

martin-g commented Jul 21, 2025

Uh oh!

martin-g commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elad-yosifon commented Jul 23, 2025

Uh oh!

martin-g commented Jul 23, 2025

Uh oh!

elad-yosifon commented Jul 23, 2025

Uh oh!

martin-g commented Jul 23, 2025

Uh oh!

martin-g commented Jul 23, 2025

Uh oh!

martin-g commented Jul 23, 2025

Uh oh!

elad-yosifon commented Jul 23, 2025 via email

Uh oh!

Uh oh!

jdarais commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martin-g commented Jul 25, 2025

Uh oh!

martin-g commented Jul 25, 2025

Uh oh!

jdarais commented Jul 26, 2025

Uh oh!

Uh oh!

martin-g commented Jul 23, 2025 •

edited

Loading

jdarais commented Jul 24, 2025 •

edited

Loading