Skip to content

Conversation

mzabaluev
Copy link
Contributor

@mzabaluev mzabaluev commented May 28, 2024

Resolves #276 in a way amenable to destructuring.

Introduce opt-in support for type-checked enumeration fields, selectively enabled for message fields matched by patterns given in Config::typed_enum_fields methods.

For the matching fields, the type of the generated Rust message field is not i32, but depends on the syntax of the enum definition:

For enums defined in files with proto2 syntax, the representation is a closed enum, that is, the generated Rust enum type. On decoding, unknown values of the field are ignored.

For enums defined in files with proto3 syntax, the representation is a generic OpenType wrapper, parameterized with the generated enum type:

pub enum OpenEnum<T> {
    /// A known value defined in the proto
    Known(T),
    /// Unknown value as decoded from the wire
    /// (uses an opaque i32 wrapper to prevent arbitrary construction)
    Unknown(Unknown),
}

When Protobuf editions are supported (see #1031), these representations will support the enum_type feature behavior for closed and open enums, respectively.

In an improvement over #1061, this allows convenient matching of enum field values as part of the message. There are also convenience methods to fallibly extract the known value in an Option or Result.

@caspermeijn
Copy link
Contributor

Please add a description that explains your proposed change. That will make reviewing easier.

@mzabaluev mzabaluev marked this pull request as ready for review May 28, 2024 21:08
@QuentinPerez
Copy link
Contributor

Do you think that this logic could be applied to the oneof field ?

@mzabaluev
Copy link
Contributor Author

mzabaluev commented May 29, 2024

Do you think that this logic could be applied to the oneof field ?

I believe it works differently for oneofs: if an unknown field number is encountered in a message and it's not a known oneof variant or a regular field, the field is ignored. So the oneof field that would get the value if the variant field were described in the proto would get None instead.

@caspermeijn
Copy link
Contributor

Will this break compatibility with proto2/closed enums? Personally, I don't use proto2, so I am not sure whether it is properly supported at all.

@caspermeijn
Copy link
Contributor

I think it makes sense to provide a migration guide.

@caspermeijn
Copy link
Contributor

What is the error message for a i32 field with #[prost(enumeration)]? Can we detect that scenario and print a nice error message?

@caspermeijn
Copy link
Contributor

I would like to see some tests for OpenEnum.

Comment on lines 278 to 282
self.#ident.get(#take_ref key).cloned().and_then(|x| { x.known() })
}
#[doc=#insert_doc]
pub fn #insert(&mut self, key: #key_ty, value: #ty) -> ::core::option::Option<#ty> {
self.#ident.insert(key, value as i32).and_then(|x| {
let result: ::core::result::Result<#ty, _> = ::core::convert::TryFrom::try_from(x);
result.ok()
})
self.#ident.insert(key, value.into()).and_then(|x| { x.known() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it beneficial to provide a get and insert function with the new wrapper? I feel like the map can be used directly with the helper functions provided by OpenEnum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As can be seen in the generated code itself, these methods provide considerable convenience for the most common use cases.

Copy link
Contributor Author

@mzabaluev mzabaluev Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the insert method could be changed to return the possible old value as Option<OpenEnum<#ty>> so the caller can choose the fallback behavior for unknown values, but that API would be inconsistent with the get method. If it's to be changed so, the only added convenience there would be the .into() conversion call.

Comment on lines 298 to 325
pub fn #get(&self) -> #ty {
::core::convert::TryFrom::try_from(self.#ident).unwrap_or(#default)
self.#ident.unwrap_or(#default)
}

#[doc=#set_doc]
pub fn #set(&mut self, value: #ty) {
self.#ident = value as i32;
self.#ident = value.into();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still beneficial to provide a get, set and push functions with the new wrapper? I feel like the field can be used directly with the helper functions provided by OpenEnum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some convenience in the setter method:

msg.set_kind(Kind::Foo)

is nicer and more type-ahead friendly than

msg.kind = Kind::Foo.into()

The same applies to the push method for repeated fields.

The getter hides a bit much opinionated behavior, in my opinion, so it might be better to leave it to OpenEnum helper methods to explicitly decide on how unknown values should be treated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider here is porting the existing code. If these methods get used instead of direct access to fields, which I assume happens a lot now because the fields are just i32 or Option<i32>, updating to the version that generates OpenEnum would not require changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better API is to have some sort of set operation on OpenEnum. Maybe something like:

msg.kind.set(Kind::Foo)

Well, this change will break existing code in many ways, so I think we should directly go for the best API

Ty::String => quote!(&str),
Ty::Bytes(..) => quote!(&[u8]),
Ty::Enumeration(..) => quote!(i32),
Ty::Enumeration(..) => unreachable!("an enum should never be queried for its ref type"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain me why it should never be queried by ref?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is preempted by logic in prost-derive. For one example, the rust_ref_type method is used on map field's key type, which cannot be constructed as an enum.

pub fn module(&self) -> Ident {
match *self {
Ty::Enumeration(..) => Ident::new("int32", Span::call_site()),
Ty::Enumeration(..) => Ident::new("enumeration", Span::call_site()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to use self.as_str() for enumerations as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that would give "enum", but we can't use that as a module name. OTOH, maybe the current behavior of Ty::as_str should be changed: "enum" is not a real type name anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the result of as_str to "enumeration" will change the output of Debug and Display for Ty. I don't know if this is a problem; it does not break any tests.

Comment on lines +757 to +557
for value in values {
encode(tag, value, buf);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could int32::encode_repeated be reused here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unless int32::encode_repeated is generalized to take an impl IntoIterator that produces i32. Which might be a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that would need a breaking change for encode_repeated which I don't think is worth the little code duplication at this point.

Comment on lines +767 to +578
if values.is_empty() {
return;
}

encode_key(tag, WireType::LengthDelimited, buf);
let len: usize = values
.iter()
.map(|value| encoded_len_varint(value.to_raw() as u64))
.sum();
encode_varint(len as u64, buf);

for value in values {
encode_varint(value.to_raw() as u64, buf);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could int32::encode_packed be reused here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, not worth changing the signature of int32::encode_packed for this.

Comment on lines 794 to 606
if wire_type == WireType::LengthDelimited {
// Packed.
merge_loop(values, buf, ctx, |values, buf, ctx| {
let mut value = Default::default();
merge(WireType::Varint, &mut value, buf, ctx)?;
values.push(value);
Ok(())
})
} else {
// Unpacked.
check_wire_type(WireType::Varint, wire_type)?;
let mut value = Default::default();
merge(wire_type, &mut value, buf, ctx)?;
values.push(value);
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could int32::merge_repeated be reused here?

Copy link
Contributor Author

@mzabaluev mzabaluev Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, definitely not. This function merges into a passed in &mut Vec<OpenEnum<T>>.

Comment on lines +823 to +624
key_len(tag) * values.len()
+ values
.iter()
.map(|value| encoded_len_varint(value.to_raw() as u64))
.sum::<usize>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could int32::encoded_len_repeated be reused here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without breaking changes, as explained above.

Comment on lines +834 to +639
if values.is_empty() {
0
} else {
let len = values
.iter()
.map(|value| encoded_len_varint(value.to_raw() as u64))
.sum::<usize>();
key_len(tag) + encoded_len_varint(len as u64) + len
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could int32::encoded_len_packed be reused here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without breaking changes, as explained above.

Comment on lines +189 to +209
let mut raw = 0;
<i32 as Message>::merge_field(&mut raw, tag, wire_type, buf, ctx)?;
*self = OpenEnum::from_raw(raw);
Ok(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same function as encoding::enumeration::merge, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, <i32 as Message>::merge_field calls to skip_field if the tag argument is not 1.

@mzabaluev
Copy link
Contributor Author

Will this break compatibility with proto2/closed enums? Personally, I don't use proto2, so I am not sure whether it is properly supported at all.

I believe prost has never been conformant with closed enums: the raw values have been left in the message as is.
I think we could try to support closed enums in two ways:

  1. Always represent an enum field as OpenEnum, but in the closed enum case, decode unknown values as Known(Default::default()). This leaves the possibility of producing unknown values on encoding.
  2. Represent closed enums with their generated Rust enum type. Decode unknown values as Default::default().

I prefer option 2, even though it requires more work. Protobuf edition 2023 has closed enums as a feature, so they need to be supported even if we ignore proto2.

@mzabaluev
Copy link
Contributor Author

I think it makes sense to provide a migration guide.

What would be a good place for it? I can add a section to the README.

@mzabaluev
Copy link
Contributor Author

I would like to see some tests for OpenEnum.

I'm going to add at least one good example/doctest on the type.

@caspermeijn
Copy link
Contributor

Have thought some more about this PR: I don't want to break users in this way. At least not now.

I suggest making this an option in prost-build so that we can experiment with the API without breaking existing users. That way, interested users can opt in, and we don't have to create a perfect OpenEnum API on the first try.

Once we feel good about the new API, we can think about changing the default.

Please look at bytes for a good example of changing the generated data type.

@ArjixWasTaken
Copy link

ArjixWasTaken commented Jun 28, 2024

Instead of introducing a new type, a simple Result<T, i32> sounds more logical to me.
enum-repr-derive follows this approach when parsing an enum from i32

@mzabaluev
Copy link
Contributor Author

Instead of introducing a new type, a simple Result<T, i32> sounds more logical to me.

It's weird to have a Result as a struct field. The names of variants and methods of Result are less than intuitively applicable: it's not necessarily an error to receive an unknown enum value from the wire, so we should give the API users a speed bump to decide how to deal with them. There are convenience methods and the TryInto impl to convert the OpenEnum value to a Result if that's the chosen approach.

Another benefit of introducing a new type is for the add-on macros and code generators that derive something on structs generated by prost-build. These could deal with the OpenEnum type in some specific way, even without access to the proto descriptor data for the field. Doing the same (e.g. defining generic trait impls) for Result would feel like too much overloading.

@caspermeijn
Copy link
Contributor

How are default values handles in this solution? Especially default values set for a specific field in the proto file.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Nov 12, 2024

How are default values handles in this solution? Especially default values set for a specific field in the proto file.

Good question. Maybe there should be a const generic parameter for this, but it will then crop up everywhere.

Updated: explained below

pub fn typed(&self) -> TokenStream {
if let DefaultValue::Enumeration(_) = *self {
quote!(#self as i32)
quote!(::prost::OpenEnum::from(#self))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles the custom default values for message fields.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented Nov 13, 2024

The generated getter methods handled per-field defaults for optional fields. For non-optional fields, the default is initialized before merging field values from the wire.

I'm not sure how best to proceed here. Suppose we have this generated message:

pub struct Msg {
    #[prost(enumeration = "AnEnum", optional, default = "AnEnum::B")]
    pub field: Option<OpenEnum<AnEnum>>
}

It would be nice to have API implementing the default value behavior for proto2 and edition 2023.

The protobuf guide recommends a getter method that returns the default value, but whether out-of-range values are exposed is up to the enum_type protobuf feature. The "hazzer" method can be generated to check presence, but in the currently produced code one can just use Option::is_some.

@mzabaluev mzabaluev force-pushed the open-enum-as-enum branch 3 times, most recently from 905fbc8 to 9bd8272 Compare November 17, 2024 19:49
@mzabaluev
Copy link
Contributor Author

I suggest making this an option in prost-build so that we can experiment with the API without breaking existing users. That way, interested users can opt in, and we don't have to create a perfect OpenEnum API on the first try.

I have addressed this and most of the other comments in recent commits. There is now a Config::typed_enum_fields method to opt into type-checked fields.

The proto2 behavior (and the closed enums feature in future editions) is also implemented for enums, albeit with some current limitations: state available to the code generator does not track which syntax the enum types were defined in, that is needed for proto2 ← proto3 includes. I'll try to resolve this.

Getter methods are retained for the optional fields for convenience and to implement per-field default values. Protobuf guidelines also insist on there being getters returning non-nullable values.

@mzabaluev
Copy link
Contributor Author

@caspermeijn please remove the breaking change label as the revised implementation is purely opt-in.

@mzabaluev
Copy link
Contributor Author

mzabaluev commented May 30, 2025

The proto2 behavior (and the closed enums feature in future editions) is also implemented for enums, albeit with some current limitations: state available to the code generator does not track which syntax the enum types were defined in, that is needed for proto2 ← proto3 includes. I'll try to resolve this.

This has been implemented in b6e2b0b
I should add a test for the enum type "stickiness" across includes soon.

@mzabaluev mzabaluev requested a review from caspermeijn May 30, 2025 20:10
mzabaluev added 3 commits May 30, 2025 23:55
Introduce OpenEnum to represent fields of Protobuf enumerated types
with the possibility of unknown values.

A new enum_type annotation is supported in the prost attribute
inside derives, which allows to specify the type-checked representation
of enum types in message fields and oneof variants.
The accepted values are "open" or "closed".

Use OpenEnum in generated code for enum-typed fields of messages
instead of i32, if the `enum_type` annotation is set to "open".

OpenEnum features get and set methods, which are meant to be
the primary ways to get and set open enumeration fields,
instead of the getters and setters generated by Message derive.

Add typed_enum_fields method to prost-build configuration, which
allows type-checked representation of enumerations in fields of
message structs and variants of oneof enums. The argument and the
invocation order works like with the boxed method.

Depending on the syntax (and preparing for the future support of
editions), the type-checked representation can be closed (for proto2)
or open (for proto3). The former is represented by the generated
enum type itself, while the latter is represented by OpenEnum
wrapping the enum type.
Enum definitions in proto2 files need to be treated
as closed, even when imported from proto3 files.
Use a data structure in the code generator Context
to track whether an enum is closed or not.
It feels inconsistent to have OpenEnum::known returning an Option,
but OpenEnum::known_or* returning a Result.
Follow the naming convention in the Option API and rename the methods
returning a Result to ok_or and ok_or_else.
@mzabaluev mzabaluev force-pushed the open-enum-as-enum branch from aaec1d5 to d8afac6 Compare May 30, 2025 20:55
@mzabaluev
Copy link
Contributor Author

One troublesome thing in this design is the public integer in the Unknown variant, which can be assigned arbitrary values including those of known variants. This can be treated as garbage in, garbage out, but it may be cleaner to use something like UnknownEnumValue and make the wrapped integer private so that bogus values can't be constructed by mistake. This will make matching on unknown values more cumbersome, but that seems like an uncommon use case.

mzabaluev added 2 commits June 7, 2025 21:37
Test that when proto3 enum types are used in proto2 syntax messages,
the enum type is treated as an open enum, while proto2 enums are closed.
Put the opaque type Into the OpenEnum::Unknown variant to disallow
construction of OpenEnum::Unknown outside of the crate. This prevents
accidental use of OpenEnum::Unknown with known values.
@ArjixWasTaken
Copy link

ArjixWasTaken commented Jun 8, 2025

@mzabaluev could you update the PR's description to be more detailed?

I see a distinction between Open/Closed/Int enums, and you've said that this entire feature is now opt-in, I believe such details need to be advertised at the description of the PR and not have people read all the comments.

e.g.

  1. How do you opt-in?
    1.5 Can you mix different enum types together? (aka is it global or not)
  2. What enum types does this introduce?
  3. Is a "closed" enum w/o a wrapper? (does it throw an error when parsing if it's invalid)

And other stuff you may find important to point out

PS: Thanks for keeping this PR alive for so long, I really really appreciate that as a user of this library.

@mzabaluev
Copy link
Contributor Author

@ArjixWasTaken I have updated the description, thank you for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A more ergonomic enumeration representation

4 participants