Skip to content

java: Use Integers in addition to Longs #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Conversation

mpkorstanje
Copy link
Contributor

@mpkorstanje mpkorstanje commented Jul 17, 2025

⚡️ What's your motivation?

Currently, messages are a bit cumbersome to use in combination with Java. The generated code currently uses long values where Java would use integers. This leads to some silly code:

lines.add((int) (long) location.getLine());

Unfortunately, long values are needed for the timestamp and duration. So simply replacing longs with integers wouldn't work.

By specifying some upper bounds the code generation can make a more informed choice about the size of number to use. The choice for 2^31 - 1 is somewhat arbitrary but matches the maximum length of strings in Java and .Net. In practice, I would expect Gherkin documents to have a much more reasonable length though.

Partially resolves: cucumber/cucumber-jvm#3013

🏷️ What kind of change is this?

  • 💥 Breaking change (incompatible changes to the API)

♻️ Anything particular you want feedback on?

Does anyone have a better idea?

📋 Checklist:

Currently, messages are a bit cumbersome to use in combination with
Java. The generated code currently uses long values where Java would
use integers. This leads to some silly code:

```java
lines.add((int) (long) location.getLine());
```

Unfortunately, long values are needed for the timestamp and duration. So
simply replacing longs with integers wouldn't work.

By specifying some upper bounds the code generation can make a more
informed choice about the size of number to use. The choice for 2^31 - 1
is somewhat arbitrary but matches the maximum length of strings in Java
and .Net. In practice, I would expect Gherkin documents to have a much
more reasonable length though.
@mpkorstanje mpkorstanje requested a review from luke-hill July 17, 2025 23:32
@luke-hill
Copy link
Contributor

Is the codegen changes here also needed? Not sure where to review start/end in this PR that's all. Some of those changes seem like they might need a bit more investigating.

# Conflicts:
#	jsonschema/src/Attachment.json
#	jsonschema/src/Duration.json
#	jsonschema/src/GherkinDocument.json
#	jsonschema/src/Meta.json
#	jsonschema/src/Pickle.json
#	jsonschema/src/Source.json
#	jsonschema/src/SourceReference.json
#	jsonschema/src/TestCase.json
#	jsonschema/src/TestCaseStarted.json
#	jsonschema/src/Timestamp.json
#	perl/lib/Cucumber/Messages.pm
#	python/src/cucumber_messages/_messages.py
#	ruby/lib/cucumber/messages/attachment.rb
#	ruby/lib/cucumber/messages/gherkin_document.rb
#	ruby/lib/cucumber/messages/meta.rb
#	ruby/lib/cucumber/messages/pickle.rb
#	ruby/lib/cucumber/messages/source.rb
#	ruby/lib/cucumber/messages/source_reference.rb
#	ruby/lib/cucumber/messages/step_match_argument.rb
#	ruby/lib/cucumber/messages/test_case_started.rb
#	ruby/lib/cucumber/messages/test_step.rb
@mpkorstanje
Copy link
Contributor Author

Is the codegen changes here also needed? Not sure where to review start/end in this PR that's all. Some of those changes seem like they might need a bit more investigating.

With #311 merged the diff should be much smaller.

A good place to start would be with the changes to the json schema. I've added minimum and maximum values.
Those changes are picked picked up by the codegen. For most languages nothing happens. For Java it is used to determine if an int or long should be used.

Copy link
Contributor

@luke-hill luke-hill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed about half of the files. I think some bits are ok. But other bits might cause issues where they change the response structures. But I'm not 100% sure without digging in again

The JSON schema stuff all seems great though, no issues there.

@@ -53,11 +53,11 @@ def nullable?(property_name, schema)
end

def scalar?(property)
property.key?('type') && language_translations_for_data_types.key?(property['type'])
property.key?('type') && select_language_translations_for_data_types(property['type'], property)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return false or a non-boolean value which I haven't checked downstream if this matters, but it could

}
def select_language_translations_for_data_types(type, property)
if type == 'integer'
if property['maximum'] and property['maximum'] <= 2147483647
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if property['maximum'] and property['maximum'] <= 2147483647
if property['maximum']&.between?(1, 2147483647)

Copy link
Contributor Author

@mpkorstanje mpkorstanje Jul 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Java, for the choice between integer and long only the upper bound matters. They're both signed.

end

def type_for(parent_type_name, property_name, property)
if property['$ref']
property_type_from_ref(property['$ref'])
elsif property['type']
property_type_from_type(parent_type_name, property_name, property, type: property['type'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is the only location for this? If so this seems good. I think a lot of the initial work I did for codegen was converting into a generic structure - I didn't go through and tidy up. So good catch if so

@@ -21,12 +21,12 @@ def format_description(raw_description, indent_string: '')

private

def language_translations_for_data_types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review here but for all generic implementors. Is the only usage of this in property_type_from_type? If so then I figure the change here is ok, but it does more heavy lifting upfront

@@ -68,12 +68,12 @@ def default_value(class_name, property_name, property, schema)
super(class_name, property_name, property)
end

def language_translations_for_data_types
def select_language_translations_for_data_types(type, property)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is never consumed here we can use _property to be memory efficient

Suggested change
def select_language_translations_for_data_types(type, property)
def select_language_translations_for_data_types(type, _property)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Messages should interact more gracefully with Java
2 participants