Skip to content

Conversation

@GreyCat
Copy link
Member

@GreyCat GreyCat commented Nov 2, 2025

Adds enumToStr implementation, allowing .to_s on enum for direct conversion to string, implementing request in kaitai-io/kaitai_struct#979.

This is mostly useful for verbose to-string implementations, to eventually replace -webide-representation which allows enum-to-string conversions.

The approach taken is the simplest and most bare bones: pet.to_s where pet == animal::cat is expected to return "cat" in original .ksy spelling (lower underscore case).

Implementations so far available for:

  • Java
  • JavaScript
  • Python
  • Ruby

Other languages so far have ??? as implementation, thus it won't break the compilation overall but will fail in runtime with Scala.NotImplementedError if attempted to be used.

Companion test PR: kaitai-io/kaitai_struct_tests#137

…for direct conversion to string - mostly useful for verbose to-string implementations
@generalmimon
Copy link
Member

@GreyCat Thanks. I haven't looked at the implementation, but I agree that we need this, otherwise to-string won't be able to replace -webide-representation. It was also proposed some time ago: kaitai-io/kaitai_struct#979

@GreyCat
Copy link
Member Author

GreyCat commented Nov 2, 2025

Thanks, added to the summary! The implementation is relatively simple, feel free to take a look when you'll be available.

override def enumToInt(v: Ast.expr, et: EnumType): String =
s"int(${translate(v)})"
override def enumToStr(v: Ast.expr, et: EnumType): String =
s"${translate(v)}.name"
Copy link
Member

@generalmimon generalmimon Nov 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only work for known/defined enum values, but fail for unknown enum values, which are represented as int in Python and thus won't have the .name attribute.

Since we use IntEnum as the base class for the enums, it isn't super simple to implement this in a way that works for both known and unknown enum values. str(enum_val) will always give you the stringified integer value, since IntEnum tries to be compatible with int. This is nicely explained in https://docs.python.org/3/library/enum.html#notes:

IntEnum, StrEnum, and IntFlag

These three enum types are designed to be drop-in replacements for existing integer- and string-based values; as such, they have extra limitations:

  • __str__ uses the value and not the name of the enum member
  • __format__, because it uses __str__, will also use the value of the enum member instead of its name

If you do not need/want those limitations, you can either create your own base class by mixing in the int or str type yourself:

>>> from enum import Enum
>>> class MyIntEnum(int, Enum):
...     pass

or you can reassign the appropriate str(), etc., in your enum:

>>> from enum import Enum, IntEnum
>>> class MyIntEnum(IntEnum):
...     __str__ = Enum.__str__

Actually, we'll want to override __str__ with our own implementation so that it returns just the member name without the enum name as a prefix (like enum.Enum.__str__ does), as shown in the example for https://docs.python.org/3/library/enum.html#enum.Enum.__str__:

    def __str__(self):
        return self.name

Then the compiler can translate the enum_val.to_s simply as str(enum_val) and it will return the enum member name for known values and stringified integer for unknown values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting design question.

Up until now, we've respected per-language design, which pretty much boils down to some languages retaining existing undeclared integer value (e.g. C/C++, JavaScript, Python), and some (e.g. Java or Ruby) having no default way to keep it stored:

  • Java will pretty much have null on invalid enums — and it's rather hard to do anything about that
  • Ruby will have nil in its current implementation we have — but, to be fair, we can influence that.

This pretty much means that if we'll try to validate it with tests, we won't get consistent results.

I can think of 3 possible choices here for handling invalid enum values:

  • Don't do anything — leave it different, some will blow up with NPE, don't care
  • Lowest common denominator — consistently return a fixed string like <UNKNOWN> or <INVALID> for all languages.
  • Have similar but slightly different values:
    • For languages who keep the value — return something like unknown(123)
    • For languages who can't keep — return something like unknown()

If we'll do (2) or (3) — we can make the test: (2) with equality, (3) with something like "starts with unknown(".

Any preferences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants