Conversation
| object ${Message}Test { | ||
| val scala = TestCases.make${Message}Scala | ||
|
|
||
| val java = protos.${Message}.toJavaProto(scala) |
There was a problem hiding this comment.
Potential problem with data preparation is here. Java protobuf created by conversion from scala proto instead of bytes parsing.
So, lazy_fields affects java serialization performance:
| serializeJava | Java |
|---|---|
| LargeStringMessage | 2,584 ns/op |
| LazyFieldsStringMessage (same as LargeStringMessage but lazy_fields: true) | 1,111 ns/op |
i think that it is caused by forced bytes writing at toJavaProto with lazy_fields enabled (ProtobufGenerator#348L). But anyway it doesn't look as clear java protobuf benchmark.
What about changing this line to val java = Protos.${Message}.parseFrom(bytes)?
| java: Boolean = true | ||
| ): Unit = { | ||
| ops.mkdir ! ops.pwd / 'results | ||
| val benchmarks0 = if (benchmarks.nonEmpty) benchmarks else testNames |
There was a problem hiding this comment.
I had a problem with running this script with fresh ammonite version. This strange code helped to run the script.
Also scalapb argument value is ignored further. So, I need to hardcode snapshot version into benchmarks/project/plugins.sbt.
| @@ -0,0 +1,81 @@ | |||
| # Agents | |||
There was a problem hiding this comment.
I can delete it if it is not necessary.
Hello!
This is a revival of the work to support lazy fields (previous attempt #1376).
Context
In Java protobuf, string fields are handled using LazyField. This mechanism stores the field data as a ByteString and only parses it into a UTF‑8 string when the corresponding getter is called. When a message containing such fields is serialized, the raw
ByteStringis written directly, without performing UTF‑8 encoding or decoding (source).Unlike java protobuf, scalapb does not lazily serialize strings. Accordingly, this is an opportunity to reduce the overhead if the following factors coincide:
stringfields;Such usage patterns are quite common for cloud-native applications.
The essence of the changes
Generating
LazyField[String]for string fields ifscalapb.options.lazy_fieldsis enabled.LazyField[T]contains the originalByteStringand lazily parses the value on demand. Introduces implicit conversions for convenient use of generated case classes.Example:
How it works with parsing and serialization:
Benchmarks
New benchmarks have been added:
roundTripScalaandroundTripJava. They test the full proto lifecycle: parsing and serialization. I was confused by the fact that transforming data inobject ${Message}Testusing thetoJavaProtomethod affects the performance results. In my generated code, this method forcesByteStringusage during java proto preparation, so the comparison is not entirely fair. The results have also improved for existing benchmarks, but I wanted a clearer comparison.Looks great. More than 3x speedup 🚀 Of course, scalapb is faster than java proto even without additional improvements.
Questions
toJavaProtoin data preparation for benchmarks (object ${Message}Test)?