Skip to content

Conversation

elad-yosifon
Copy link
Contributor

fix #226: Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if=..)]

@martin-g martin-g requested a review from Copilot July 17, 2025 03:44
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for skip_field in SerializeStruct to prevent index-out-of-bounds panics when using #[serde(skip_serializing_if)], and introduces tests covering various skip scenarios.

  • Implemented skip_field to increment the item counter for skipped fields
  • Added tests for skipping the first, middle, last, and multiple struct fields
  • Ensures no panic occurs when serializing with skipped optional fields

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
avro/tests/avro-rs-226.rs New tests for skip-serializing-if behavior in structs
avro/src/ser_schema.rs Added skip_field method to SerializeStruct impl
Comments suppressed due to low confidence (2)

avro/tests/avro-rs-226.rs:6

  • [nitpick] The test function names reference issue 225 but the PR fixes issue 226. Consider renaming these to avro_rs_226_... for consistency.
fn avro_rs_225_index_out_of_bounds_with_serde_skip_serializing_skip_middle_field() -> TestResult {

avro/tests/avro-rs-226.rs:22

  • These tests only ensure no panic occurs but don’t verify the serialized output. Consider adding assertions to check that only the expected fields are serialized.
    writer.into_inner()?;

@martin-g
Copy link
Member

I applied the suggestions by Copilot (compare the serialized record with the deserialized one) and the assertion fails ... The Some(1) is lost ... I'll try to debug it soon!

@martin-g
Copy link
Member

This won't work with the current version of avro-rs.

With your PR the field is properly skipped, i.e. nothing is serialized for this field.
But Avro's Reader uses the T::schema() to lead the deserialization of the data and wrongly reads the y's value into x field and z's value into y.

For full roundtrip we will need a Serde-driven deserialization too.
ser.rs and de.rs are Schema-driven impls.
ser_schema.rs is Serde-driven serialization. We will need de_schema.rs (or some better names) for Serde-driven impl.

@jdarais Do you agree with me here ?

@elad-yosifon
Copy link
Contributor Author

@martin-g any ETA on this? I do understand that my PR is not fixing anything.

My current situation is that I have a struct that derives serde::Serialize, and I need to use the same struct for CSV, JSON, and AVRO. Since I am trying to avoid redundant fields (mostly for JSON), I cannot remove the skip_serializing_if directive AFAIK.

Is there a workaround you can suggest?

@martin-g
Copy link
Member

@martin-g any ETA on this?

Dunno.
I started working on it at #237
Feel free to send PRs against my branch!

Elad Yosifon and others added 3 commits July 23, 2025 10:05
… field

We need to serialize something, otherwise the deserialization does not
know that something has been skipped and tries to read a value according
to the schema

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g
Copy link
Member

martin-g commented Jul 23, 2025

@elad-yosifon While working on the serde-driven deserialization I've realized that skip_field() needs to serialize something, otherwise the deserialization uses the wrong Avro Schema of the next field.
So I've added logic to use the RecordField's default value (an Option).
Now only the multiple skipped fields test fails due to the skip attribute. For some reason Serde does not call skip_field() for this field ...

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@elad-yosifon
Copy link
Contributor Author

@martin-g thanks for the update.
This seems odd, but at least the test caught it :)

Are you done with the debugging, or still at it?

@martin-g
Copy link
Member

I have some other things to do now but I will continue debugging it later.
Any help/hints why Serde does not call the method are welcome!

@elad-yosifon
Copy link
Contributor Author

@martin-g maybe it is a niche optimization, as all of the NONE optionals are at the tail, so one might think there is no need to indicate the skip.

@martin-g
Copy link
Member

The y field is tagged with skip_serializing_if and skip. The next field (z) has just skip_serializing_if

https://github.com/apache/avro-rs/pull/227/files#diff-d3a12aeda34e86ab5d302f29a1673cef4fd727f5ad5d9bab18380817a88a47d5R90-R94

The skip_field() method is called directly for z!
Removing skip_serializing_if for y does not change anything.

@martin-g
Copy link
Member

skip_serializing_if leads to a call to skip_field!
But skip_serializing (without the _if!) does not!
skip attribute just sets skip_serializing and skip_deserializing

martin-g added 2 commits July 23, 2025 15:51
Serde's `skip` breaks it because it does not notify us when
`#[serde(skip_serialize)]` is used.

Drop the support for Struct tuple (e.g. `struct S(A, B, C)` - it is hard
to map it to Avro record schema. It should still work for Avro array.

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g
Copy link
Member

@elad-yosifon I'd be OK to merge this PR in its current state if it is enough for your use case.
It supports skip_serializing_if but it does not support skip_serializing/skip!
The latter will probably be supported once #237 is finished (but it is a ton of work...).

@elad-yosifon
Copy link
Contributor Author

elad-yosifon commented Jul 23, 2025 via email

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@martin-g martin-g merged commit 673bec2 into apache:main Jul 24, 2025
20 checks passed
@jdarais
Copy link
Contributor

jdarais commented Jul 24, 2025

Sorry about the delayed response, I've been traveling. On this question:

For full roundtrip we will need a Serde-driven deserialization too.
ser.rs and de.rs are Schema-driven impls.
ser_schema.rs is Serde-driven serialization. We will need de_schema.rs (or some better names) for Serde-driven impl.

I was looking into creating a serde-driven deserializer, but in the end it seemed easier to first deserialize into an avro Value, which provides some flexibility that helps with schema resolution (if the reader schema is different from the writer schema,) and then use the existing de.rs implementation to convert from Value to the final type.

I feel like this approach should still work for a round-trip of serialization and deserialization. If it doesn't then I think that would mean that either the serde-driven serialization or the schema-driven deserialization has a bug, or simply that the default value used for serialization, (provided by the schema,) is different from the default value used for deserialization, (which as far as I can tell uses Default::default and bypasses deserialization logic altogether.)

In general, the skip_serializing_if attribute seems to be at odds with serialization to Avro, since Avro structs give no accommodation for simply skipping serialization of a field: all fields of a struct must be present. (Using the default value for the field provided by the schema is a reasonable workaround, since that simulates omiting the field, though per the spec, the default value is "only used when reading instances that lack the field for schema evolution purposes. The presence of a default value does not make the field optional at encoding time." Also, with the changes from this PR, if the default value is not present, then the struct will be serialized incorrectly.)

Using the skip attribute would be easier to use with Avro since you can just ensure that the skipped field isn't included in the Avro schema used to write.

Now that I think about it, I'm guessing that skip_serializing_if is mostly useful for removing attributes that are just null or the default value so they don't take up unnecessary space in the serialized payload. It might work to just have the skip_field implementation do the exact same thing as serialize_field, effectively always including the field, since you're not allowed to omit the field anyway.

On a different topic: I see from the PR that there was some logic in there before that ensured that struct fields were written in the correct order, (they must be written in the order in which they appear in the schema,) even if the order of the fields in the rust struct is different from what's in the schema. It'd be nice to get that back in. (I realize there should have been a test for it.)

@martin-g
Copy link
Member

Hi @jdarais !

I was looking into creating a serde-driven deserializer, but in the end it seemed easier to first deserialize into an avro Value, which provides some flexibility that helps with schema resolution (if the reader schema is different from the writer schema,) and then use the existing de.rs implementation to convert from Value to the final type.

The problem here is that the users expect that the schema-driven deserialization in de.rs should take into account the Serde attributes, like the skip ones. But maybe as you said we just have to ignore them. Otherwise we will face issues later with attributes like flatten, remote, etc.
But in that case I have no good answer to "I want to use the same serde mechanism for JSON, CSV, Avro, ...) (#227 (comment)).

Using the skip attribute would be easier to use with Avro since you can just ensure that the skipped field isn't included in the Avro schema used to write.

Supporting skip is OK.
But what to do with skip_serializing and skip_deserialing ?! We will need to have a field for such fields in the Avro Schema but Serde does not call skip_field() when skip_serializing is used ...

It might work to just have the skip_field implementation do the exact same thing as serialize_field, effectively always including the field, since you're not allowed to omit the field anyway.

We kinda do this at the moment.

  • fn serialize_field<T>(&mut self, key: &'static str, value: &T)
  • fn skip_field(&mut self, key: &'static str)
    skip_field does not provide the value, so we use the default from the Avro Schema. This might need some more work! At the moment it is treated as an Option but probably it should be unpacked.

On a different topic: I see from the PR that there was some logic in there before that ensured that struct fields were written in the correct order, (they must be written in the order in which they appear in the schema,) even if the order of the fields in the rust struct is different from what's in the schema. It'd be nice to get that back in. (I realize there should have been a test for it.)

You are totally right here!
It should work fine when the schema is derived but it will break bad at deserialization time if the schema is created manually!
For now we can add a note to the documentation!

I removed all logic that didn't break any test :-/
It could be returned with the respective unit tests!

@martin-g
Copy link
Member

It should work fine when the schema is derived but it will break bad at deserialization time if the schema is created manually! For now we can add a note to the documentation!

#242

martin-g added a commit that referenced this pull request Jul 25, 2025
@jdarais
Copy link
Contributor

jdarais commented Jul 26, 2025

Ah, those are some good points...

Supporting skip is OK.
But what to do with skip_serializing and skip_deserialing ?! We will need to have a field for such fields in the Avro Schema but Serde does not call skip_field() when skip_serializing is used ...

I think skip_serializing and skip_deserializing are also doable. Since the Serializer is the one with the schema, the schema is decoupled from the struct implementing Serialize, so you just have to make sure that the schema you use to serialize doesn't include the field with the skip_serializing attribute. (Of course, reading the value back would require a different schema that includes the field 🙁 but it is doable with different read and write schemas. If you use skip_serializing but not skip_deserializing, then maybe that's just what you're signing yourself up for.)

skip_deserializing should work more easily, since if we first read the value into an avro Value using the schema, then when we convert from Value to the struct using the de.rs implementation, it should be fine to ignore the value since doing so won't cause issues with parsing the rest of the record in the same way that ignoring a field does when serializing.

skip_field does not provide the value, so we use the default from the Avro Schema. This might need some more work! At the moment it is treated as an Option but probably it should be unpacked.

Ah, that's a very good point. That's a bummer that the value isn't provided in skip_field. Yeah, if the default value is None (meaning no default value was specified) then we're kind of stuck, since there's no way to know what should go in that field. Might be good to just return an Err in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index-Out-Of-Bounds panic when using #[serde(skip_serializing_if = "Option::is_none")]
3 participants